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1  Introduction 


Many  of  the  most  successful  quantum  algorithms  are  designed  around  symmetries,  for  which 
group  representation  theory  provides  the  mathematical  foundation.  These  algorithms  tradition¬ 
ally  have  achieved  their  speedups  with  the  quantum  Fourier  transform  (QFT),  but  this  is  not  the 
only  method  known  to  exploit  group  symmetries.  One  concept  which  has  been  productive  in 
mathematics,  chemistry,  physics,  and  recently  quantum  information  theory,  is  known  as  Schur  (or 
Schur-Weyl)  duality.  Early  in  this  project  we  gave  an  efficient  quantum  circuit,  which  we  call  the 
Schur  transform  by  analogy  to  the  QFT,  for  transforming  quantum  data  between  two  different 
forms:  the  standard  computational  basis  and  the  Schur  basis.  This  allows  quantum  computers 
to  efficently  compute  using  the  Schur  symmetries  of  quantum  information.  While  this  already 
has  applications  to  quantum  communication,  one  of  our  main  goals  is  to  find  algorithmic  uses  of 
the  transform.  We  are  also  looking  at  ways  of  using  Schur  symmetry  in  a  purely  mathematical 
sense  to  construct  quantum  algorithms,  so  that  Schur  duality  would  be  used  in  the  analysis  of  the 
algorithm  but  its  implementation  would  not  explicitly  use  the  Schur  transform. 

We  report  the  following  major  accomplishments  over  the  span  of  this  project  timeline,  from 
09/01/05  to  08/31/08: 

•  Efficient  circuit  for  the  Schur  transform  devised  -  Phys.  Rev.  Lett.,  vol.  97,  pp.  170502, 2006. 

•  Qudit  version  of  Schur  transform  devised  -  Proc.  18th  ACM-SIAM  Symposium  on  Discrete 
Algorithms  (SODA),  pp.  1235-1244, 2007. 

•  Schur  transform  applied  to  hidden  subgroup  problem  -  Proc.  24th  Symposium  on  Theoretical 
Aspects  of  Computer  Science  (STACS  2007),  Lecture  Notes  in  Computer  Science  4393,  pp.  598- 
609,  2007. 

•  Quantum  expanders  developed  -  Q.  Inf.  Comp.,  vol.  8,  no.  8/9,  pp.  715-721,2008.  [arXiv:0709.1142] 

•  Analysis  of  random  quantum  circuits  -  Comm.  Math.  Phys.  vol.  291,  no.  1,  pp.  257-302,2009. 
[arXiv:0802.1919] 

•  Study  of  tensor  product  expanders  -  Q.  Inf.  Comp.  vol.  9,  pp.  336-360,2009.  [arXiv:0804.0011] 
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•  New  quantum  algorithm  for  superpolynomial  speedups  based  on  quantum  circuits  -  Proc. 
of  the  35th  International  Colloquium  on  Automata,  Languages  and  Programming  (ICALP  2008), 
LNCS  5125,  pp.  782-795,  2008.  [arXiv:0805.0007] 

Major  publications  detailing  these  results,  specifically  including  those  cited  here,  appear  in  the 
appendix  of  this  report. 

2  Project  motivation  and  summary  of  scientific  results 

The  main  goal  of  this  project  was  to  develop  new  quantum  algorithms,  based  on  the  application 
of  the  Schur  transform,  as  a  new  building  block,  different  from  the  quantum  Fourier  transform. 

We  analyzed  a  natural  strategy  for  using  the  Schur  transform  to  solve  the  hidden  subgroup 
problem  (HSP;  a  common  framework  for  quantum  speedups),  and  found  that  it  failed  to  give  an 
exponential  speedup.  Along  the  way,  we  gave  upper  and  lower  bounds  on  the  complexity  of  the 
quantum  collision  problem,  which  are  tight  for  oracle  complexity  and  nearly  tight  for  time  com¬ 
plexity.  (Here  we  mean  a  quantum  generalization  of  the  classical  collision  problem  from  cryptog¬ 
raphy,  in  which  we  want  to  distinguish  a  uniform  distribution  on  an  unknown  N  elements  from 
one  on  an  unknown  2N  elements.)  This  work  was  published  in  STACS. 

We  also  looked  beyond  the  HSP  for  problems  where  quantum  computers  can  exhibit  exponen¬ 
tial,  or  at  least  superpolynomial,  speedups  over  classical  computers.  Initially,  we  found  a  problem 
which  cannot  be  solved  on  a  classical  computer  in  polynomial  time,  but  which  can  be  solved  quan- 
tumly  using  the  QFT  over  the  symmetric  group,  which  is  closely  related  to  the  Schur  transform, 
and  for  which  no  previous  application  was  known.  By  generalizing  our  construction,  we  found 
that  these  sorts  of  speedups  can  in  fact  be  obtained  from  any  efficiently  implementable  QFT  (i.e. 
over  any  finite  group),  or  even  from  most  random  circuits  that  are  sufficiently  long.  On  the  one 
hand,  this  shows  that  the  group  symmetry  was  less  important  than  many  people  initially  believed. 
On  the  other,  it  means  we  have  constructed  a  large  class  of  superpolynomial  quantum  speedups 
which  look  radically  unlike  the  speedups  based  on  the  HSP. 

One  of  the  building  blocks  of  the  Schur  transform  was  a  quantum  Clebsch-Gordan  transform 
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for  the  unitary  group.  If  this  could  be  generalized  to  a  quantum  Fourier  transform  on  the  unitary 
group,  it  could  have  many  applications  to  dealing  with  unitary  symmetries  on  quantum  comput¬ 
ers.  However,  we  have  not  yet  been  successful  at  turning  classical  circuits  for  the  unitary  group 
Fourier  transform4  into  quantum  circuits  for  the  unitary  group  QFT.  The  difficulty  is  that,  un¬ 
like  simpler  Fourier  transforms,  the  unitary  group  Fourier  transform  involves  intermediate  steps 
which,  if  implemented  on  a  quantum  computer,  would  not  preserve  the  overall  normalization  of 
the  state.  Thus,  these  transformations  would  either  be  physically  impossible,  or  would  have  an 
unacceptably  high  failure  rate.  This  only  means  that  our  first  approach  failed,  however,  and  not 
that  an  efficient  quantum  algorithm  is  ruled  out.  We  are  currently  investigating  other  ways  to 
approach  the  problem,  mostly  involving  technical  changes  in  how  the  data  is  represented,  as  well 
as  examining  different  recursive  decompositions  of  the  Legendre  transform  that  is  at  the  heart  of 
the  problem. 

While  we  have  not  yet  constructed  an  efficient  QFT  over  the  unitary  group,  we  have  found  one 
important  application  that  would  be  made  possible  by  such  a  QFT.  Classically,  expander  graphs 
are  an  extremely  useful  algorithm  tool,  with  applications  in  error-correcting  codes,  network  de¬ 
sign,  probabilistically  checkable  proofs,  pseudorandomness,  cryptographic  hash  functions,  and 
other  fields.  Only  recently,  a  definition  of  a  quantum  expander  was  proposed,  and  applications 
were  given  to  cryptography5  and  to  condensed  matter  physics6.  However,  no  efficient  implemen¬ 
tations  of  quantum  expanders  are  currently  known.  We  found  a  method  to  implement  a  quantum 
expander  that  would  be  efficient  if  rotations  in  high-dimensional  irreps  of  SU(2)  could  be  effi¬ 
ciently  simulated  on  a  quantum  computer.  This  task  would  in  turn  be  efficiently  implementable 
if  a  QFT  over  SU(2)  could  be  efficiently  carried  out  on  a  quantum  computer.  On  the  other  hand, 
a  direct  implementation  of  rotations  in  SU(2)  irreps  was  claimed.  The  method  there  turns  out  to 
be  missing  some  crucial  steps,  which  we  have  worked  to  fill  in.  Doing  so  gives  the  only  known 
efficient  construction  of  a  quantum  expander. 

Finally,  we  also  investigated  Clebsch-Gordan  transforms  over  groups  other  than  the  unitary 
group,  and  have  constructed  explicit  efficient  circuits  for  the  dihedral  and  Heisenberg  group.  Cas¬ 
cading  them  will  allow  the  construction  of  circuits  that  are  analogous  to  the  Schur  transform,  but 
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with  the  dihedral  or  Heisenberg  group  in  place  of  the  unitary  group.  We  have  investigated  vari¬ 
ants  of  the  HSP  for  which  these  circuits  might  be  useful. 

Our  work  on  superpolynomial  speedups  also  led  us  to  investigate  the  properties  of  short  ran¬ 
dom  quantum  circuits  (a.k.a.  pseudorandom  unitaries),  and  the  extent  to  which  they  approximate 
the  behavior  of  fully  random  unitary  matrices.  The  superpolynomial  speedup  mentioned  above 
can  be  obtained  by  analyzing  the  second  moment  of  a  family  of  pseudorandom  unitaries  (inspired 
by  the  techniques  in  8),  but  we  expect  better  constructions  and  additional  applications  (efficient 
methods  of  randomizing  quantum  states,  or  of  constructing  unknown  quantum  states  from  ora¬ 
cles)  to  arise  from  studying  their  higher  moments.  Ideally,  the  results  would  be  analogous  to  the 
classical  case,  where  polynomial-size  random  circuits  approximate  random  functions  to  all  orders 
(in  other  words,  achieving  nearly  t-wise  independence,  for  any  t)  as  the  circuit  size  increases.  For 
this  project,  the  quantum  circuits  that  are  constructed  would  not  explicitly  use  the  Schur  trans¬ 
form;  instead  it  is  our  analysis  of  the  circuit  that  makes  use  of  Schur  duality. 
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Abstract:  Given  a  universal  gate  set  on  two  qubits,  it  is  well  known  that  applying 
random  gates  from  the  set  to  random  pairs  of  qubits  will  eventually  yield  an  approxi¬ 
mately  Haar-distributed  unitary.  However,  this  requires  exponential  time.  We  show  that 
random  circuits  of  only  polynomial  length  will  approximate  the  first  and  second  moments 
of  the  Haar  distribution,  thus  forming  approximate  1-  and  2-designs.  Previous  construc¬ 
tions  required  longer  circuits  and  worked  only  for  specific  gate  sets.  As  a  corollary  of 
our  main  result,  we  also  improve  previous  bounds  on  the  convergence  rate  of  random 
walks  on  the  Clifford  group. 


1.  Introduction:  Pseudo-Random  Quantum  Circuits 

There  are  many  examples  of  algorithms  that  make  use  of  random  states  or  unitary 
operators  (e.g.  [5,28]).  However,  exactly  sampling  from  the  uniform  Haar  distribution 
is  inefficient.  In  many  cases,  though,  only  pseudo-random  operators  are  required.  To 
quantify  the  extent  to  which  the  pseudo-random  operators  behave  like  the  uniform  dis¬ 
tribution,  we  use  the  notion  of  k-designs  (often  referred  to  as  /-designs).  A  k -design  has 
kth  moments  equal  to  those  of  the  Haar  distribution.  For  most  uses  of  random  states  or 
unitaries,  this  is  sufficient.  Constructions  of  exact  k-designs  on  states  are  known  (see  [3] 
and  references  therein)  and  some  are  efficient.  Ambainis  and  Emerson  [3]  introduced 
the  notion  of  approximate  state  k-designs,  which  can  be  implemented  efficiently  for 
any  k.  However,  the  known  constructions  of  unitary  k-designs  are  inefficient  to  imple¬ 
ment.  Approximate  unitary  2-designs  have  been  considered  [10,14,18],  although  the 
approaches  are  specific  to  2-designs. 

We  consider  a  general  class  of  random  circuits  where  a  series  of  two-qubit  gates  are 
chosen  from  a  universal  gate  set.  We  give  a  framework  for  analysing  the  kth  moments 
of  these  circuits.  Our  conjecture,  based  on  an  analogous  classical  result  [23],  is  that  a 
random  circuit  on  n  qubits  of  length  poly(«,  k)  is  an  approximate  k-design.  While  we 
do  not  prove  this,  we  instead  give  a  tight  analysis  of  the  k  =  2  case.  We  find  that  in  a 
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Fig.  1.  An  example  of  a  random  circuit.  Different  lines  indicate  a  different  gate  is  applied  at  each  step 


broad  class  of  natural  random  circuit  models  (described  in  Sect.  1.1),  a  circuit  of  length 
0(n(n  +  log  1  / e))  yields  an  6 -approximate  2-design.  Our  definition  of  an  approximate 
^-design  is  in  Sect.  2.2.  Our  results  also  apply  to  an  alternative  definition  of  an  approxi¬ 
mate  2-design  from  [10],  for  which  we  show  random  circuits  of  length  0(n(n+ log  1  /e ) ) 
yield  e-approximations,  thus  extending  the  results  of  that  paper  to  a  larger  class  of  cir¬ 
cuits.  Moreover,  our  results  also  apply  to  random  stabiliser  circuits,  meaning  that  a 
random  stabiliser  circuit  of  length  0(n(n  +  log  1  /  e ) )  will  be  an  e -approximate  2-design. 
This  both  simplifies  the  construction  and  tightens  the  efficiency  of  the  approach  of  [14], 
which  constructed  e -approximate  2-designs  in  time  0(n6(n2  +  log  I/O)  using  0(n3) 
elementary  quantum  gates. 


1.1.  Random  Circuits.  The  random  circuit  we  will  use  is  the  following.  Choose  a  2-qubit 
gate  set  that  is  universal  on  U  (4)  (or  on  the  stabiliser  subgroup  of  U (4)).  One  example 
of  this  is  the  set  of  all  one  qubit  gates  together  with  the  controlled-NOT  gate.  Another 
is  simply  the  set  of  all  of  U  (4).  Then,  at  each  step,  choose  a  random  pair  of  qubits  and 
apply  a  gate  from  the  universal  set  chosen  uniformly  at  random.  For  the  U  (4)  case,  the 
distribution  will  be  the  Haar  measure  on  U  (4).  One  such  circuit  is  shown  in  Fig.  1  for 
n  =  4  qubits.  This  is  based  on  the  approach  used  in  Refs.  [9,26]  but  our  analysis  is  both 
simpler  and  more  general. 

Since  the  universal  set  can  generate  the  whole  of  U  (2")  in  this  way,  such  random 
circuits  can  produce  any  unitary.  Further,  since  this  process  converges  to  a  unitarily 
invariant  distribution  and  the  Haar  distribution  is  unique,  the  resulting  unitary  must  be 
uniformly  distributed  amongst  all  unitaries  [15].  Therefore  this  process  will  eventually 
converge  to  a  Haar  distributed  unitary  from  U  (2").  This  is  proven  rigourously  in  Lemma 

3.2.  However,  a  generic  element  of  U (2")  has  4”  real  parameters,  and  thus  to  even  have 
<2  (4_w)  fidelity  with  the  Haar  distribution  requires  C  (4”)  2-qubit  unitaries.  We  address 
this  problem  by  considering  only  the  lower-order  moments  of  the  distribution  and  show¬ 
ing  these  are  nearly  the  same  for  random  circuits  as  for  Haar-distributed  unitaries.  This 
claim  is  formally  described  in  Theorem  2.2. 

Our  paper  is  organised  as  follows.  In  Sect.  2  we  define  unitary  ^-designs  and  explain 
how  a  random  circuit  could  be  used  to  construct  a  ^-design.  In  Sect.  3  we  work  out  how 
the  state  evolves  after  a  single  step  of  the  random  circuit.  We  then  extend  this  to  multiple 
steps  in  Sect.  4  and  prove  our  general  convergence  results.  A  key  simplification  will  be 
(following  [26])  to  map  the  evolution  of  the  second  moments  of  the  quantum  circuit  onto 
a  classical  Markov  chain.  We  then  prove  a  tight  convergence  result  for  the  case  where 
the  gates  are  chosen  from  U  (4)  in  Sect.  5.  This  section  contains  most  of  the  technical 
content  of  the  paper.  Using  our  bounds  on  mixing  time  we  put  together  the  proof  that 
random  circuits  yield  approximate  unitary  2-designs  in  Sect.  6.  Section  7  concludes  with 
some  discussion  of  applications. 
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2.  Preliminaries 

2.1.  Pauli  expansion.  Much  of  the  following  will  be  done  in  the  Pauli  basis.  The  Pauli 
operators  will  be  taken  as  {cro,  cri ,  <72,  cr3}  and  defined  to  be 


If  I  VO  €  C2"  is  a  state  on  n  qubits  then  we  write  t p  =  \\jj)  (t/r|.  We  can  expand  \[r  in  the 
Pauli  basis  as 

t/r  =  2-M/2^y(p)tTp,  (2.1) 

P 

where  ap  =  api  ®  ...  ®  crPn  for  the  string  p  =  p\  ■  ■  ■  pn-  Inverting,  the  coefficients 
y(p)  are  given  by 

yip)  =  2~n,hxopf.  (2.2) 

It  is  easy  to  show  that  the  coefficients  yip)  are  real  and,  with  the  chosen  normalisation, 
the  squares  sum  to  tr  x//2,  which  is  1  for  pure  xf.  In  general 

1 

p 

with  equality  if  and  only  if  xjj  is  pure.  Note  also  that  tr  i fr  =  1  is  equivalent  to  y  (0)  = 

2~”/2. 

This  notation  is  extended  to  states  on  nk  qubits  by  treating  y  as  a  function  of  k  strings 
from  {0,  1,  2,  3}'!.  Thus  a  state  p  on  nk  qubits  is  written  as 

p  =  2~nk/2  ^  yoipi,  ■  •  ■  ,  Pk)&pi  ®  •  •  •  <8>  <JPk.  (2.3) 

PU-,Pk 


2.2.  k-designs.  We  will  say  that  a  /.'-design  is  efficient  if  the  effort  required  to  sample 
a  state  or  unitary  from  the  design  is  polynomial  in  n  and  k.  Note  that  we  do  not  require 
the  number  of  states  to  be  polynomial  because,  even  for  approximate  unitary  designs,  an 
exponential  number  of  unitaries  is  required.  Rather,  the  number  of  random  bits  needed 
to  specify  an  element  of  the  design  should  be  poly(n,  k). 

2.2.1.  State  designs  A  (state)  k-design  is  an  ensemble  of  states  such  that,  when  one  state 
is  chosen  from  the  ensemble  and  copied  k  times,  it  is  indistinguishable  from  a  uniformly 
random  state.  This  is  a  way  of  quantifying  the  pseudo-randomness  of  the  state  and  is 
a  quantum  analogue  of  /.'-wise  independence.  Hayashi  et  al.  [20]  give  an  inefficient 
construction  of  k-designs  for  any  n  and  k. 

The  state  k-design  definition  we  use  is  due  to  Ref.  [3]: 

Definition  2.1.  An  ensemble  of  quantum  states  {pi,  f, )  \  is  a  state  k-design  if 

YpiiWi)Wi\)®k=  [  (ivo m®kdf,  (2.4) 

,•  J + 

where  the  integration  is  taken  over  the  left  invariant  Haar  measure  on  the  unit  sphere 
in  Cd,  normalised  so  that  d  f  =  1. 
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the  symmetric  subspace  of  k  (/-dimensional  spaces.  For  a  rigourous  proof,  see  Ref.  [16] 
and  for  a  less  precise  proof,  but  from  a  quantum  information  perspective,  see  Ref.  [7]. 

2.2.2.  Unitary  designs  A  unitary  ^-design  is,  in  a  sense,  a  stronger  version  of  a  state 
design.  Just  as  applying  a  Haar-random  unitary  to  an  arbitrary  pure  state  results  in  a 
uniformly  random  pure  state,  applying  a  unitary  chosen  from  a  unitary  ^-design  to  an 
arbitrary  pure  state  should  result  in  a  state  ^-design.  Another  way  to  say  this  is  that  the 
state  obtained  by  acting  U®k ,  where  U  is  drawn  from  a  unitary  /-design  on  U (d),  on 
any  dk -dimensional  state  should  be  indistinguishable  from  the  case  where  U  is  drawn 
uniformly  from  U  (d).  Formally,  we  have: 

Definition  2.2.  Let  { pi ,  t/,}  be  an  ensemble  of  unitary  operators.  Define 


(2.5) 


and 


Gh(p)=  [  u®kp(u*)®kdu. 


(2.6) 


Ju 


Then  the  ensemble  is  a  unitary  k -design  iff  Gw  =  Gh- 

Unitary  designs  can  also  be  defined  in  terms  of  polynomials,  so  that  if  p  is  a  polynomial 
with  degree  k  in  the  matrix  elements  of  U  and  k  in  the  matrix  elements  off/*,  then  aver¬ 
aging  p  over  a  unitary  /-design  should  give  the  same  answer  as  averaging  over  the  Haar 
measure.  To  see  the  equivalence  with  Definition  2.2  note  that  averaging  a  monomial  over 
our  ensemble  can  be  expressed  as  (/ 1, . . . ,  ik\Gw(\ji,  jk)(j[,  j'kWv  •  •  • ,  **), 
and  so  if  Gw  =  Gh  then  any  polynomial  of  degree  k  will  have  the  same  expectation 
over  both  distributions. 


2.3.  Approximate  k- designs. 

2.3.1.  Approximate  state  designs  Numerous  examples  of  exact  efficient  state  2-design 
constructions  are  known  (e.g.  [8])  but  general  exact  constructions  are  not  efficient  in  n 
and  k.  Approximate  state  designs  were  first  introduced  by  Ambainis  and  Emerson  [3] 
and  they  constructed  efficient  approximate  state  /-designs  for  any  k.  Aaronson  [1]  also 
gives  an  efficient  approximate  construction. 

We  define  approximate  state  designs  as  follows. 

Definition  2.3.  An  ensemble  of  quantum  states  { pt ,  |  f, ) }  is  an  e  -approximate  state 
k-design  if 


(1  -€)[  {\^){yf\)^k d^f  <  ^Pi  (\fi){fi\)®k  <  (1  +€)[  mm)®kdf.  (2.7) 
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In  [3],  a  similar  definition  was  proposed  but  with  the  additional  requirement  that  the 
ensemble  also  forms  a  1-design  (exactly),  i.e. 

V  =  f  I 

i  J * 

This  requirement  was  necessary  there  only  so  that  a  suitably  normalised  version  of  the 
ensemble  would  form  a  POVM.  We  will  not  use  it. 

By  taking  the  partial  trace  one  can  show  that  a  k-design  is  a  /7-design  for  k'  <  k. 
Thus  approximate  /.'-designs  are  always  at  least  approximate  1 -designs. 


2.3.2.  Approximate  unitary  designs  It  was  shown  in  Ref.  [4]  that  a  quantum  analogue 
of  a  one  time  pad  requires  2 n  bits  to  exactly  randomise  an  n  qubit  state.  However,  in 
Ref.  [5]  it  was  shown  that  n  +  o(n)  bits  suffice  to  do  this  approximately.  Translated  into 
k-design  language,  this  says  an  exact  unitary  1 -design  requires  2ln  unitaries  but  can  be 
done  approximately  with  2n+o(nK  So  approximate  designs  can  have  fewer  unitaries  than 
exact  designs.  Here,  we  are  interested  in  improving  the  efficiency  of  implementing  the 
unitaries.  There  are  no  known  efficient  exact  constructions  of  unitary  /.'-designs;  it  is 
hoped  that  our  approach  will  yield  approximate  unitary  designs  efficiently. 

We  will  require  approximate  unitary  k-designs  to  be  close  in  the  diamond  norm  [24]: 

Definition  2.4.  The  diamond  norm  of  a  superoperator  T, 

i !T. i  ii^.jn  \\(T  ®  idd)X\U 

IITIIo  =  sup  1 1 T  <g>  idd\\ 00  =  sup  sup - — — - , 

d  d  X^O  ll^lll 

where  idci  is  the  identity  channel  on  d  dimensions. 

Operationally,  the  diamond  norm  of  the  difference  between  two  quantum  operations  tells 
us  the  largest  possible  probability  of  distinguishing  the  two  operations  if  we  are  allowed 
to  have  them  act  on  part  of  an  arbitrary,  possibly  entangled,  state.  In  the  supremum  over 
ancilla  dimension  d,  it  can  be  shown  that  d  never  needs  to  be  larger  than  the  dimension  of 
the  system  that  T  acts  upon.  The  diamond  norm  is  closely  related  to  completely  bounded 
norms  (cb-norms),  in  that  1 1 T  |  |0  is  the  cb-norm  of  7’  :  and  can  also  be  interpreted  as  the 
L\  — >  L\  cb-norm  of  T  itself  [1 1,27]. 

We  can  now  define  approximate  unitary  k-designs. 

Definition  2.5.  Q \y  is  an  e  -approximate  unitary  k-design  if 

\\Gw-Gh\\o<€,  (2.8) 


where  Q w  and  Q h  are  defined  in  Definition  2.2. 

In  Ref.  [10],  they  consider  approximate  twirling,  which  is  implemented  using  an  approxi¬ 
mate  2-design.  They  give  an  alternative  definition  of  closeness  which  is  more  convenient 
for  this  application: 


Definition  2.6  ([10 ]).Let{pi,  Ui }  be  an  ensemble  of  unitary  operators.  Then  this  ensem¬ 
ble  is  an  e -approximate  twirl  if 


max 

A 


EwW(A(WtpW/))^t  -  E(7f/(A([/tpt/))f/t 


(2.9) 


where  the  first  expectation  is  over  W  chosen  from  the  ensemble  and  the  second  is  the 
Haar  average.  The  maximisation  is  over  channels  A  and  d  is  the  dimension  (2n  in  our 
case). 


Our  results  work  for  both  definitions  with  the  same  efficiency. 
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2.4.  Random  Circuits  as  k-designs.  If  a  random  circuit  is  to  be  an  approximate  k-design 
then  Eq.  2.8  must  be  satisfied  where  the  t/;  are  the  different  possible  random  circuits. 
We  can  think  of  this  as  applying  the  random  circuit  not  once  but  k  times  to  k  different 
systems. 

Suppose  that  applying  t  random  gates  yields  the  random  circuit  W.  If  W®k  acts  on 
an  nk-qubit  state  p,  then  following  the  notation  of  Eq.  2.8,  the  resulting  state  is 

pw  ■=  W®kp(W^)®k  =  2~nk/2  ^  yo(pi,...,W)WapiWt®---®  WoPkWK 

PU-,Pk 

(2.10) 

For  this  to  be  a  k-design,  the  expectation  over  all  choices  of  random  circuit  should  match 
the  expectation  over  Haar-distributed  W  e  U  (2"). 

We  are  now  ready  to  state  our  main  results.  Our  results  apply  to  a  large  class  of  gate 
sets  which  we  define  below: 

Definition  2.7.  Let  £  =  { pi ,  U, }  be  a  discrete  ensemble  of  elements  from  U  (d).  Define 
an  operator  Gg  by 


Gs  ■■=  52 Piu?k  ®  (tff)®*-  (2.11) 

i 

More  generally,  we  can  consider  continuous  distributions.  If  p  is  a  probability  measure 
on  U  id)  then  we  can  define  Gp  by  analogy  as 

G/x  :=  f  dp(U)U®k  ®  (U*)®k.  (2.12) 

JU{d) 

Then  £  (or  p)  is  k-copy  gapped  if  Gg  (or  G fl)  has  only  k\  eigenvalues  with  absolute 
value  equal  to  1. 

For  any  discrete  ensemble  £  =  { p, ,  Ut },  we  can  define  a  measure  p  =  pfitjj .  Thus, 
it  suffices  to  state  our  theorems  in  terms  of  p  and  GM. 

The  condition  on  GM  in  the  above  definition  may  seem  somewhat  strange.  We  will 
see  in  Sect.  3  that  when  d  >  k  there  is  a  k! -dimensional  subspace  of  (<Cd)®2k  that  is  acted 
upon  trivially  by  any  G;i.  Additionally,  when  /x  is  the  Haar  measure  on  U  (d)  then  G jL 
is  the  projector  onto  this  space.  Thus,  the  k-copy  gapped  condition  implies  that  vectors 
orthogonal  to  this  space  are  shrunk  by  G jL. 

We  will  see  that  G/t  is  k-copy  gapped  in  a  number  of  important  cases.  First,  we  give 
a  definition  of  universality  that  can  apply  not  only  to  discrete  gates  sets,  but  to  arbitrary 
measures  on  U  (4). 

Definition  2.8.  Let  p  be  a  distribution  on  U  (4).  Suppose  that  for  any  open  ball  S  C  U  (4) 
there  exists  a  positive  integer  l  such  that  fi*'  (S)  >  0.  Then  we  say  ji  is  universal  [for 
U(  4)7. 

Here  p*1  is  the  f-fold  convolution  of  p  with  itself;  i.e. 

p*[  =  j  8pv..utdp(U\)  ■  ■  ■  dp(Ui). 

When  p  is  a  discrete  distribution  over  a  set  {Uj },  Definition  2.8  is  equivalent  to  the  usual 
definition  of  universality  for  a  finite  set  of  unitary  gates. 
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Theorem  2.1.  The  following  distributions  on  U  (4)  are  k-copy  gapped: 

(i)  Any  universal  gate  set.  Examples  are  U  (4)  itself,  any  entangling  gate  together  with 
all  single  qubit  gates,  or  the  gate  set  considered  in  [26]. 

(ii)  Any  approximate  (or  exact)  unitary  k-design  on  2  qubits,  such  as  the  uniform  dis¬ 
tribution  over  the  2-qubit  Clifford  group,  which  is  an  exact  2-design. 


Proof. 

(i)  This  is  proven  in  Lemma  3.2. 

(ii)  This  follows  straight  from  Definition  2.2.  □ 

Theorem  2.2.  Let  pi  be  a  2-copy  gapped  distribution  and  W  be  a  random  circuit  on  n 
qubits  obtained  by  drawing  t  random  unitaries  according  to  ji  and  applying  each  of 
them  to  a  random  pair  of  qubits.  Then  there  exists  C  (depending  only  on  /i)  such  that 
for  any  e  >  0  and  any  t  >  C(n(n  +  log  1  /(-.)),  Q w  is  an  e -approximate  unitary  2-design 
according  to  either  Definition  2.5  or  Definition  2.6. 


To  prove  Theorem  2.2,  we  show  that  the  second  moments  of  the  random  circuits  converge 
quickly  to  those  of  a  uniform  Haar  distributed  unitary.  For  W  a  circuit  as  in  Theorem 

2.2,  write  yw(pu  P2)  for  the  Pauli  coefficients  of  pw  =  W®2p  (W’t')<8>2.  Then  write 
Yt(pi,  P2)  =  ^wYwipi,  Pi)  where  W  is  a  circuit  of  length  t.  Then  we  have 


Lemma  2.1.  Let  pi  and  W  be  as  in  Theorem  2.2.  Let  the  initial  state  be  p  with  yo(p,  p)  > 
0  and  yo(p,  p)  =  1  (for  example  the  state  \  fi)  (t/t|  ®  for  any  pure  state  \fi)). 

Then  there  exists  a  constant  C  (possibly  depending  on  pi)  such  that  for  any  e  >  0, 

0) 

Z  (yt(PuP2)-Spip22  1  )  <6  (2.13) 

P\,P2  '  \  )/ 

P1P2#00 


for  t  >  Cn  log  1/e. 

(ii) 


Z 

Pl’P2 

PlP2jtO0 


1 

YtiPu  Pi)  ~  <W2  2#i(2h  +  1) 


<  e 


(2.14) 


for  t  >  Cn(n  +  log  1/e)  or,  when  pi  is  the  uniform  distribution  on  U( 4)  or  its 
stabiliser  subgroup,  t  >  Cn  log 

We  can  then  extend  this  to  all  states  by  a  simple  corollary: 


Corollary  2.1.  Let  pi,  W  and  yw  be  as  in  Lemma  2.1.  Then,  for  any  initial  state  p  = 
jir  X/;,  p2  H) ( Ph  Pl)api  ®  <7p2 ,  there  exists  a  constant  C  (possibly  depending  on  pi) 
such  that  for  any  e  >  0, 

0) 


^  (  ,  x  s  HptoYo(p,p)\\ 

2 ^  [ytipu pi)  - sP1P2 — ^ — I  <e 

P1’P2  \  / 

P1P2^00 


(2.15) 


for  t  >  Cn(n  +  log  1  /e). 
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(ii) 


Z 

PI-P2 
PIP2^00 

for  t  >  Cn(n  +  log  1/e). 

By  the  usual  definition  of  an  approximate  design  (Definition  2.5),  we  only  need  con¬ 
vergence  in  the  2-norm  (Eq.  2.15),  which  is  implied  by  1-norm  convergence  (Eq.  2.16) 
but  weaker.  However,  Definition  2.6,  which  requires  the  map  to  be  close  to  the  twirling 
operation,  requires  1-norm  convergence  (i.e.  Eq.  2. 16).  Thus,  Theorem  2.2  for  Definition 
2.5  follows  from  Corollary  2.1(i)  and  Theorem  2.2  for  Definition  2.6  follows  from 
Corollary  2. 1  (ii).  Theorem  2.2  is  proved  in  Sect.  6  and  Corollary  2.1  in  Sect.  4. 

We  note  that,  in  the  course  of  proving  Lemma  2.1,  we  prove  that  the  eigenvalue  gap 
(defined  in  Sect.  4.3)  of  the  Markov  chain  that  gives  the  evolution  of  the  y(p,  p )  terms 
is  O (1/n).  It  is  easy  to  show  that  this  bound  is  tight  for  some  gate  sets. 

Related  work.  Here  we  summarise  the  other  efficient  constructions  of  approximate  uni¬ 
tary  2-designs. 

-  The  uniform  distribution  over  the  Clifford  group  on  n  qubits  is  an  exact  2-design  [14]. 
Moreover,  [14]  described  how  to  sample  from  the  Clifford  group  using  O  («8)  classi¬ 
cal  gates  and  0(n3)  quantum  gates.  Our  results  show  that  applying  0(n(n+ log  1/c)) 
random  two-qubit  Clifford  gates  also  achieve  an  e-approximate  2-design  (although 
not  necessarily  a  distribution  that  is  within  e  of  uniform  on  the  Clifford  group). 

-  Dankert  et  al.  [10]  gave  a  specific  circuit  construction  of  an  approximate  2-design. 
To  achieve  small  error  in  the  sense  of  Definition  2.5,  their  circuits  require  the  same 
0(n(n  +  log  1/e))  gates  that  our  random  circuits  do.  However,  when  we  use  Def¬ 
inition  2.6,  the  circuits  from  [10]  only  need  0(n  log  1/e)  gates  while  the  random 
circuits  analysed  in  this  paper  need  to  be  length  0{n{n  +  log  1/e)). 

-  The  closest  results  to  our  own  are  in  the  papers  by  Oliveira  et  al.  [9,26],  which 
considered  a  specific  gate  set  (random  single  qubit  gates  and  a  controlled-NOT)  and 
proved  that  the  second  moments  converge  in  time  0(n2(n  +  log  1/e)).  Our  strat¬ 
egy  of  analysing  random  quantum  circuits  in  terms  of  classical  Markov  chains  is 
also  adapted  from  [9,26].  In  Sect.  3,  we  generalise  this  approach  to  analyse  the  kth 
moments  for  arbitrary  k. 

The  main  results  of  our  paper  extend  the  results  of  [9,26]  to  a  larger  class  of  gate 
sets  and  improve  their  convergence  bounds.  Some  of  these  improvements  have  been 
conjectured  by  [30],  which  presented  numerical  evidence  in  support  of  them. 


Yt(P  l,  Pi)  ~  <W2; 


P) 


4"  -  1 


<  e 


(2.16) 


3.  Analysis  of  the  Moments 

In  order  to  prove  our  results,  we  need  to  understand  how  the  state  evolves  after  each 
step  of  the  random  circuit.  In  this  section  we  consider  just  one  step  and  a  fixed  pair  of 
qubits.  Later  on  we  will  extend  this  to  prove  convergence  results  for  multiple  steps  with 
random  pairs  of  qubits  drawn  at  every  step.  We  consider  first  the  Haar  distribution  over 
the  full  unitary  group  and  then  will  discuss  the  more  general  case  of  any  2-copy  gapped 
distribution. 

In  this  section,  we  work  in  general  dimension  d  and  with  a  general  Hermitian  orthog¬ 
onal  basis  no,  ... ,  od 2_j.  Later  we  will  take  d  to  be  either  4  or  2"  and  the  cr,  to  be 
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Pauli  matrices.  However,  in  this  section  we  keep  the  discussion  general  to  emphasise 
the  potentially  broader  applications. 

Fix  an  orthonormal  basis  for  d  x  d  Hermitian  matrices:  op,  . . . ,  ad 2_1;  normalised 
so  that  tr  opaq  =  d  Spq.  Let  op  be  the  identity.  We  need  to  evaluate  the  quantity 

E u  (U®kapi  <g>  •  •  •  <g>  apk(U'  fk^j  =:T( p),  (3.1) 

where  the  expectation  is  over  Haar  distributed  U  e  U  (d).  We  will  need  this  quantity  in 
two  cases.  Firstly,  for  d  =  2n ,  these  are  the  moments  obtained  after  applying  a  uniformly 
distributed  unitary  so  we  know  what  the  random  circuit  must  converge  to.  Secondly,  for 
d  =  4,  this  tells  us  how  a  random  U  (4)  gate  acts  on  any  chosen  pair. 

Call  the  quantity  in  Eq.  3.1  T (p)  (we  use  bold  to  indicate  a  /.-tuple  of  coefficients; 
take  p  =  (pi ,  . . . ,  pk))  and  write  it  in  the  ap  basis  as 

T (p)  -  G(q;  p)<r9l  0  •  ■  ■  0  aqk.  (3.2) 

q 

Here,  G(q;  p)  is  the  coefficient  in  the  Pauli  expansion  of  T (p)  and  we  define  G  as  the 
matrix  with  entries  equal  to  G(q;  p).  We  have  left  off  the  usual  normalisation  factor 
because,  as  we  shall  see,  with  this  normalisation  G  is  a  projector.  Inverting  this,  we  have 

G(q;  p)  =  d~k tr  (aqx  <g>  •  •  •  <g>  aqkT ( p)) 

=  d~kEutr  ({aqi  0  •  ■  ■  0  aqk)U®k(api  ®  ®  crPk)(Uf)9k)  .  (3.3) 

Note  that  G  is  real  since  T  and  the  basis  are  Hermitian. 

We  can  gain  all  the  information  we  need  about  the  Haar  integral  in  Eq.  3.1  with  the 
following  observations: 

Lemma  3.1.  T  (p)  commutes  with  U®k  for  any  unitary  U. 

Proof.  Follows  from  the  invariance  of  the  Haar  measure  on  the  unitary  group. 
Corollary  3.1.  T  (p)  is  a  linear  combination  of  permutations  from  the  symmetric  group 

Sk- 

Proof  This  follows  from  Schur-Weyl  duality  (see  e.g.  [16]). 

From  this,  we  can  prove  that  G  is  a  projector  and  find  its  eigenvectors. 

Theorem  3.1.  G  is  symmetric,  i.e.  G(q;  p)  =  G(p;  q). 

Proof  Follows  from  the  invariance  of  the  trace  under  cyclic  permutations. 

Theorem  3.2.  Pn  is  an  eigenvector  of  G  with  eigenvalue  1  for  any  permutation  operator 
Pn  he. 


^  G(p;  q)tr  {oqx  0  •  •  •  0  oqk Pn)  =  tr  (api  <g>  •  •  •  ®  crpkPn). 

q 


Further,  any  vector  orthogonal  to  this  set  has  eigenvalue  0. 
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Proof.  For  the  first  part, 

^  G(p;  q)tr  (crqi  0  . . .  0  aqkPn) 

q 

=  d~k  ^  Ejytr  (aqi  U crpi  t/T)  . . .  tr  ( ' oqk  U crpk  t/r)  tr  (crqi  <g>  . . .  <g>  aqk  P„) 

q 

=  d~k tr  f  PnEu  ^  tr  (crqx  U crpi  U^j  crqi  0  . . .  0  y  tr  ^  C aqic  J  (3.4) 

\  ?i  qk  / 

Writing  U^apU  in  the  ap  basis,  we  find 

\  Str  {vqUVpU")  Oq  =  t/o-pt/^ 
q 

Therefore  Eq.  3.4  becomes 

tr  (Pji-Etft/Vpjt/  <g> . . .  <g>  U*crPkU\  =  tr  (crpi  0  . . .  <g>  crpkPn)  . 

For  the  second  part,  consider  any  vector  v  which  is  orthogonal  to  the  permutation  oper¬ 
ators  (we  can  neglect  the  complex  conjugate  because  Pn  is  real  in  this  basis),  i.e. 

tr  (crqi  <g>  •  •  •  <g>  aqk  P„)  u(q)  =  0  (3.5) 

q 

for  any  permutation  jr .  Then 

y^G(p;  q)u(q)  =d~k^  tr  [oqx  0  •  •  •  <g>  crqiT($))  u(q) 
q  q 

which  is  zero  since  T  (p)  is  a  linear  combination  of  permutations  and  v  is  orthogonal  to 
this  by  Eq.  3.5.  □ 

Theorem  3.3.  G 2  =  G,  i.e.  ^q,  G(p;  q')G(q';  q)  =  G(p;  q). 

Proof.  Using  Eq.  3.3, 

G(p;  q')G(q';  q)  =  y^  G(p;  q')d“*tr  (aq>  0  •  •  ■  <g>  or^T(q)^  . 

q'  q' 

From  Corollary  3 . 1 ,  T  (q)  is  a  linear  combination  of  permutations.  This  implies,  using 
Theorem  3.2  that 

y  G(p;  (\)d~ktx  (aq’x  <g>  . . .  <g>  <r?,  r(q))  =  d~k  tr  {crpx  <g>  ■  •  •  <g>  aPkT{  q)) 

q' 

=  G(p;  q) 

as  required.  □ 

Corollary  3.2.  G  is  a  projector  so  has  eigenvalues  0  and  1. 

We  now  evaluate  G  and  T  for  the  cases  of  A:  =  I  and  k  =  2  since  these  are  the  cases 
we  are  interested  in  for  the  remainder  of  the  paper. 
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3.1.  k  =  1.  The  k  =  1  case  is  clear:  the  random  unitary  completely  randomises  the 
state.  Therefore  all  terms  in  the  expansion  are  set  to  zero  apart  from  the  identity  i.e. 


T(p)  = 


<70 

0 


p  =  0 

p#o. 


(3.6) 


3.2.  k  =  2.  For  k  =  2,  there  are  just  two  permutation  operators,  identity  I  and  swap 
T .  Therefore  there  are  just  two  eigenvectors  with  non-zero  eigenvalue  (n  >  1).  In 
normalised  form,  taking  them  to  be  orthogonal,  their  components  are 

fi(qi,qi)  =  &qxo&q2o, 

l 

fiiqi i  qi)  =  2  _  -  <^io)- 


We  will  now  prove  three  properties  of  G  that  we  need: 
1.  G(pi,  pi\  qi,qi)  =  0  if  p\  #  p2  or  qx  #  q2. 


Proof.  Consider  the  function  f(q i,  q2)  =  8qia8q2b  with  a  b.  This  function  has 
zero  overlap  with  the  eigenvectors  f\  and  f2  so  it  goes  to  zero  when  acted  on  by 
G.  Therefore  G{p\,  p2\  a,b)  =0.  The  claim  follows  from  the  symmetry  property 
(Theorem  3.1).  □ 

With  this  we  will  write  G(p;  q)  =  G{p\,  p2;  q\,  q2). 

2.  G(p-0)  =  Sp0. 

Proof.  Let  G  act  on  eigenvector  f\ .  □ 

3.  G(p;  a)  =  fora,  p  #  0. 


Proof.  Let  G  act  on  the  input  Sqa.  This  has  zero  overlap  with  f\  and  overlap 
with  f2.  □ 

Therefore  we  have 


G(p i,  p2;  q\,q2) 


0  pi#P2orqi#q2 

1  Pi  =  p2  =  q\  =  q2  =  0 

^2Zi  P\  =  P2  #  0,  qi  =  q2  #  0 


(3.7) 


Since  T(p\,  p2)  =  ^quq2  G(pi,  p2;  q\,q2)oq{  ®  aq2,  we  have 


T(p i,  p2) 


0,  p\  #  p2, 

<7o  ®  cr0,  p\  =  p2  =  0, 

^2zy  o  V  ®  av'  Pi  =  Pi  #  0. 


(3.8) 


Therefore  the  terms  apx  <g>  opi  with  p\  p2  are  set  to  zero.  Further,  the  sum  of  the 
diagonal  coefficients  y(p,  p)  is  conserved.  This  allows  us  to  identify  this  with  a  proba¬ 
bility  distribution  (after  renormalising)  and  use  Markov  chain  analysis.  To  see  this,  write 
again  the  starting  state 


P 


1  x- 

^  2_^  Ko(<7t,<72)<7?1  ®oq2 


268 


A.  W.  Harrow,  R.  A.  Low 


with  the  state  after  application  of  any  unitary  W, 

-  X!  Yw(quq2)(7qi  ®aq2  =  2“'!  ^  yiquqi)  (wcrqiW^  <g>  (w<7q2W^  . 

q\,qi  qi.qi 


2]  Yw(q>  q)  =  2  Str  (CT<?  ® 

q  q 

=  tr  (Jpiv) 

=  ;;  ^  y(<?i ,  <?2)tr  [T  (woq{W^  ®  (Wff?2Wt)) 

q\ m 

=  ^  X  K(?i,42)tr  (ff4lff92) 

=  ^v(q,q) 

q 

as  required,  where  J7  is  the  swap  operator  and  we  have  used  Lemmas  A. 2  and  A.l. 


pw 

Then 


3.3.  Moments  for  general  universal  random  circuits.  We  now  consider  universal  distri¬ 
butions  /x  that  in  general  may  be  different  from  the  uniform  (Haar)  measure  on  U  (d ) . 
Our  main  result  in  this  section  will  be  to  show  that  a  universal  distribution  on  U  (4)  is 
also  2-copy  gapped.  In  fact,  we  will  phrase  this  result  in  slightly  more  general  terms  and 
show  that  a  universal  distribution  on  U  (d)  is  also  k-copy  gapped  for  any  k.  Universality 
(Definition  2.8)  generalises  in  the  obvious  way  to  U id),  whereas  when  we  say  that  /x  is 
k-c opy  gapped,  we  mean  that 


\\G„ -GV wWoo  <  1,  (3.9) 

where  G?  =  ®  (U*)®k ,  with  the  expectation  taken  over  /x  for  G jt  or  over  the 

Haar  measure  for  Gu{d)- 

The  reason  Eq.  3.9  represents  our  condition  for  /x  to  be  A: -copy  gapped  is  as  follows: 
Observe  that  G  and  G  are  unitarily  related,  so  the  definition  of  k-copy  gapped  could 
equivalently  be  given  in  terms  of  G.  We  have  shown  above  that  Gij(d)  (and  thus  Gu{d )) 
has  all  eigenvalues  equal  to  0  or  1;  i.e.  is  a  projector.  By  contrast,  G tl  may  not  even 
be  Hermitian.  However,  we  will  prove  below  that  all  eigenvectors  of  Gjj{d)  with  eigen¬ 
value  1  are  also  eigenvectors  of  with  eigenvalue  1.  Thus,  Eq.  3.9  will  imply  that 
limr_i,oo(G!M)f  =  Gu(d),  just  as  we  would  expect  for  a  gapped  random  walk. 

We  would  like  to  show  that  Eq.  3.9  holds  whenever  /x  is  universal.  This  result  was 
proved  in  [6]  (and  was  probably  known  even  earlier)  when  /x  had  the  form  ( r>  /■/ ,  +<$(y9)/2. 
Here  we  show  how  to  extend  the  argument  to  any  universal  /x. 

Lemma  3.2.  Let  ji  be  a  distribution  on  U  id).  Then  all  eigenvectors  ofGu(d)  with  eigen¬ 
value  1  are  eigenvectors  ofG M  with  eigenvalue  one.  Additionally,  if  //  is  universal  then 
li  is  k-copy  gapped  for  any  positive  integer  k  (cf.  Eq.  3.9). 

In  particular,  if  k  =  2  this  lemma  implies  that  /x  is  2-copy  gapped  (cf.  Theorem  2.1). 
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Proof.  Let  V  =  Cd  be  the  fundamental  representation  of  U  (d),  where  the  action  of 
U  e  U  (d )  is  simply  U  itself.  Let  V*  be  its  dual  representation,  where  U  acts  as  U*. 
The  operators  GM  and  Gu(d)  act  on  the  space  V®k  0  (y*)®k.  We  will  see  that  Gu(d ) 
is  completely  determined  by  the  decomposition  of  V®k  0  (y*)®k  into  irreducible  rep¬ 
resentations  (irreps).  Suppose  that  the  multiplicity  of  (rx,  14)  in  V®k  0  (\/*)®k  js  m ■  > 
where  the  Vx  ’s  are  the  irrep  spaces  and  rx(U)  the  corresponding  representation  matrices. 
In  other  words 


®k  0  (y*)®k  =  0  ®  cm\ 

(3.10) 

0  ( U*)®k  ~  J] \k)(k\  0  rxlU)  0  Imx. 

(3.11) 

k 


Here  ~  indicates  that  the  two  sides  are  related  by  conjugation  by  a  fixed  (U -independent) 
unitary. 

Let  k  =  0  denote  the  trivial  irrep:  i.e.  Vo  =  C  and  tq  ( U )  =  1  for  all  U .  We  claim 
that  E urx(U)  =  0  whenever  a  0  and  the  expectation  is  taken  over  the  Haar  measure. 
To  show  this,  note  that  Eyrx(U)  commutes  with  r> ( V )  for  all  V  £  U (d)  and  thus,  by 
Schur’s  Lemma,  we  must  have  E urx(U)  =  cl  for  some  ceC.  However,  by  the  transla¬ 
tion-invariance  of  the  Haar  measure  we  have  cl  —  E yrx(U)  =  Eyrx(UV)  =  crx(V ) 
for  all  V  £  U ( d ).  Since  k  4  0,  we  cannot  have  r;  ( V )  =  I  for  all  V  and  so  it  must  be 
that  c  =  0. 

Thus,  if  we  write  G\j(d)  and  G M  using  the  basis  on  the  RHS  of  Eq.  3.1 1,  we  have 

G(J(d)  =  |0)(0[  <g>  Imo,  (3.12) 

where  |0)  (0|  is  a  projector  onto  the  trivial  irrep.  On  the  other  hand, 

=  |0)(0|®/mo  +  2jk)(A[<g>  (  f  rx(U)diA(U)J  0  Imx.  (3.13) 

Thus,  every  eigenvector  of  Gy(d)  with  eigenvalue  one  is  also  fixed  by  G fl.  For  the 
remainder  of  the  space,  the  direct  sum  structure  means  that 


II  Gy(d)  ~  ||oo  =  max 
a  ao 

Note  that  this  maximisation  only  includes  k  with  dim  VA  >  1 .  This  is  because  non¬ 
trivial  one-dimensional  irreps  of  U  (d)  have  the  form  det  Um  for  some  non- zero  inte¬ 
ger  m.  Under  the  map  U  e'^’U ,  such  irreps  pick  up  a  phase  of  eun<K  However, 
U®k  0  (U*)®k  is  invariant  under  U  i->  el<^U .  Thus  V®k  0  (V*)®k  cannot  contain  any 
non-trivial  one-dimensional  irreps. 

Now  suppose  by  contradiction  that  there  exists  k  ^  0  with  in >  ^  0  and  ||  f  rx(U) 
dp{U)\\oo  =  1.  (We  do  not  need  to  consider  the  case  ||  f  rx(U)d /i(U)\\oo  >  1,  since 
\\rk(U)\\oo  =  1  f°r  all  U  and  ||  •  ||c>o  obeys  the  triangle  inequality.)  Indeed,  the  triangle 
inequality  further  implies  that  there  exists  a  unit  vector  |u)  e  Vx  such  that 

J  dii{U)rx(U)\v)  =  (o\v). 


rx(U)dfj,(U ) 


(3.14) 


for  some  co  e  C  with  \u>\  —  1. 
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By  the  above  argument  we  can  assume  that  dim  V\  >  1 .  Since  V\  is  irreducible,  it  can¬ 
not  contain  a  one-dimensional  invariant  subspace,  implying  that  there  exists  Uq  €  U  (d) 
such  that 


\{v\rk(U0)\v)\^l-S, 

for  some  8  >  0.  Since  U  i->  |(u[r^(f/)|u)  |  is  continuous,  there  exists  an  open  ball  S 
around  Uq  such  that  |(i)|r^(f/)|u)|  <  1  —  8/2  for  all  U  e  S.  Define  S  :=  U ( d)\S . 

Now  we  use  the  fact  that  /x  is  universal  to  find  an  l  such  that  /x*#:  ( S )  >  0.  Next,  observe 
that  f  d/x*£(U)  {v\rx(U)\v}  =  a>1.  Taking  the  absolute  value  of  both  sides  yields 


dp*£{U){v\rx{U)\v) 


'U(d) 


< 


dn*l{U)  |(u[rx(C/)|v)| 


'U(d) 


=  /  dfi*£(U)  |(u|rx(f/)|u)|  + 


dn*£(U)  |(u|rA(f/)|u)| 


JS  JS 

<  /x*£(5)  (l  -  I)  +  (l  - 

<  1, 

a  contradiction.  We  conclude  that  \\Gu(d)  ~  G/xIloo  <  1- 


4.  Convergence 

In  Sect.  3  we  saw  that  iterating  any  universal  gate  set  on  U  (d)  eventually  converges  to 
the  uniform  distribution  on  U  (d).  Since  the  set  of  all  two-qubit  unitaries  is  universal  on 
U  ( 2n ),  this  implies  that  random  circuits  eventually  converge  to  the  Haar  measure.  In 
this  section,  we  turn  to  proving  upper  bounds  on  this  convergence  rate,  focusing  on  the 
first  two  moments. 

Let  G^-i)  be  the  matrix  with  G  (with  d  =  4)  acting  on  qubits  i  and  j  and  the  identity 
on  the  others.  Then,  if  the  pair  (l,  j )  is  chosen  at  step  t,  we  can  find  the  coefficients  at 
step  t  +  1  by  multiplying  by  G^\  In  general,  a  random  pair  is  chosen  at  each  step.  So 

K+i(P)  =  N  ,  1  q)Kt(q),  (4.1) 

where  yt+\  are  the  expected  coefficients  at  step  t.  We  can  think  of  this  evolution  as 
repeated  application  of  the  matrix 

P  =  - - - V  G(ij).  (4.2) 

n(n  -  1)  ^ 
i^j 

For  k  =  2,  the  key  idea  of  Oliveira  et  al.  [26]  was  to  map  the  evolution  of  the  y(p,  p) 
coefficients  to  a  Markov  chain.  The  y  (p\ ,  P2)  coefficients  with  p\  ^  pi  just  decay  as 
each  qubit  is  chosen  and  can  be  analysed  directly. 

However,  we  can  only  map  the  y(p,  p)  coefficients  to  a  probability  distribution  when 
they  are  non-negative,  which  is  not  the  case  for  general  states.  Most  of  the  rest  of  the 
paper  is  dedicated  to  proving  Lemma  2.1,  which  only  applies  to  states  with  y{p,  p)  >  0 
and  normalised  so  their  sum  is  1.  Corollary  2.1  then  extends  this  to  all  states: 
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Proof  (of  Corollary  2.1).  Lemma  2.1  still  applies  to  the  y(pi,  pf)  terms  with  p\  /  pi. 
Therefore  we  just  need  to  show  how  to  apply  Lemma  2.1  to  states  that  initially  have 
some  negative  y(p,  p)  terms. 

For  the  y(p,  p)  terms,  Lemma  2.1  says  that  the  random  walk  starting  with  any  ini¬ 
tial  probability  distribution  converges  to  uniform  in  some  bounded  time  t.  Let  g,  ( p , 
p\  q,q)  be  the  coefficients  after  t  steps  of  the  walk  starting  at  a  particular  point  q 
(i.e.  go(p,  p:  q,  q)  =  Now,  for  any  starting  state  p,  let  the  initial  coefficients 

be  Yo(p,  P)-  Then,  by  linearity,  we  can  write  the  expected  coefficients  after  t  steps 
Yiip,  P)  :=  ^Yw(p,  p)  as 

Yt(p,  p)  =  2^  to(<7,  q)gt(p,  p\  q,  q)  (4.3) 

q*  0 


for  p  #  0. 

We  can  now  prove  convergence  rates  for  the  expected  coefficients  yt(p,  p): 

(i)  For  the  2-norm,  we  have  from  Lemma  2.1  that  for  t  >  Cn  log  1  /<?, 

X  (st(p,  p‘,q,q)  -  (4-4) 

p^o v  / 

for  any  q .  Note  that  the  normalisation  for  the  y  (p ,  p)  terms  with  p  f  0  has  changed 
from  Lemma  2.1  since  we  are  neglecting  the  y( 0,  0)  term  here.  Now 

^  Z^oKo(<7,<7)\2 

2^[y,(p,p) - — I 

P*  o\  / 

=  X  (^yo(q’q)  (gt(p,p;q,q)  -  4„  [_  1 

p± o  \q* o  v 

<  2^k0  (q,q?  2Z(  gt(p,  P‘,  q',q)  - 

q^O  q'^Op^O^ 

<  (4"  -  \)e^yo(q,q)2 

q±  o 

<  4"e  2]  ko07i,<72)2 

91.92 

=  4'7etr  p2 

<  4'f, 


where  the  first  inequality  is  the  Cauchy-Schwarz  inequality.  Therefore  for  t  > 
Cn(n  +  log 4”/e),  the  2-norm  distance  from  stationarity  for  the  yip.  p)  terms  is  at 
most  e.  Choose  C'  such  that  C'n(n  +  log  1/e)  >  Cn(n  +  log4,!/e)  to  obtain  the 
result. 

(ii)  For  the  1-norm,  Lemma  2.1  says  that  for  t  >  Cn(n  +  log  1/c), 


Z 


1 


<  e. 


gt(q‘,  p,  p)  - 


4«  _  i 


(4.5) 
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We  can  then  proceed  much  as  for  the  2-norm  case: 


X 

P  /0 


Yt(P,  P )  - 


Hq^oYo(q,q) 


4”  -  1 


Z 

p/  o 


^Yo(q,q)  ^ 


<?/  o 

7/0  P/0 

-  lyo(9.9)l 

7/0 

<  2"e. 


gt(p,p-,q,q ) 


gt(P,P\q,q ) 


4” 


4"  -  1 


The  last  inequality  follows  from  |crg  ®  cr?|  =  a0  <g>  cr0.  Therefore  for  t  >  Cn(n  + 
log  2'i/e),  the  1-norm  distance  from  stationarity  for  the  y(p,  p)  terms  is  at  most  e. 

We  now  proceed  to  prove  Lemma  2.1.  Firstly,  we  will  consider  the  simple  case  of 
k  =  1  to  prove  this  process  forms  a  1 -design  as  this  will  help  us  to  understand  the  more 
complicated  case  of  k  =  2. 


4.1.  First  moments  convergence.  Recall  that  p  =  2-”/2  y  Y(p)ap  and  we  wish  to 
evaluate  the  moments  of  the  coefficients.  So  for  the  first  moments  to  converge,  we  want 
to  know  E y{p). 

For  k  —  1,  the  t/  (4)  random  circuit  uniformly  randomises  each  pair  that  is  chosen. 
More  precisely,  a  pair  of  sites  i,  j  are  chosen  at  random  and  all  the  coefficients  with 
pi  ^  0  or  pj  =k  0  are  set  to  zero.  Thus  we  get  an  exact  1 -design  when  all  sites  have  been 
hit.  For  other  gate  sets,  the  terms  do  not  decay  to  zero  but  decay  by  a  factor  depending 
on  the  gap  of  G.  Call  the  gap  A;  for  U (4)  A  =  1  and  for  others  0  <  A  <  1  and  A  is 
independent  of  n .  Therefore  once  each  site  has  been  hit  m  times  the  terms  have  decayed 
by  a  factor  (1  —  A)'”. 

For  a  bound  like  the  mixing  time  (see  Sect.  4.3  for  definition),  we  want  to  bound 
the  quantity  \^wYw(p)\,  where  yw(p)  is  the  Pauli  coefficient  after  applying  the 

random  circuit  W.  We  also  want  2-norm  bounds,  so  we  bound  (^wYw(p))2  to°- 

We  will  in  fact  find  bounds  on  ^p^0Ew\ywip)\  and  ^ p^0(^w\Yw ip)\)2 ,  which  are 
stronger. 

A  standard  problem  in  the  theory  of  randomised  algorithms  is  the  ‘coupon  collector’ 
problem.  If  a  magazine  comes  with  a  free  coupon,  which  is  chosen  uniformly  randomly 
from  n  different  types,  how  many  magazines  should  you  buy  to  have  a  high  probability 
of  getting  all  n  coupons?  It  is  not  hard  to  show  that  n  In  'j  samples  (magazines)  have  at 
least  a  1  —  e  probability  of  including  all  n  coupons.  Using  this,  we  expect  all  sites  to 
be  hit  with  probability  at  least  1  —  e  after  0  (n  log  " )  steps.  This  argument  can  be  made 
precise  in  this  context  by  bounding  the  non- identity  coefficients.  We  find,  as  expected, 
that  the  sum  is  small  after  0(n  log  n)  steps: 
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Lemma  4.1.  After  O  ( n  log  1  /e)  steps 

yi  (Eiv|yiv(77)|)2  <  e, 

p^O 

and  after  0(n  log  |)  .v/ep.v, 


^Eiy|yiv(p)|  <  e.  (4.6) 

pA  o 

Proof  At  each  step,  a  pair  of  sites  is  chosen  at  random  and  any  terms  with  non-identity 

/Oj  (vi _ 1  1 

coefficients  for  this  pair  decay  by  a  factor  (1  —  A).  For  example,  the  term  o\  ®  cr0 
decays  whenever  the  first  site  is  chosen.  Thus  the  probability  of  each  term  decaying 
depends  on  the  number  of  zeroes.  We  start  with  the  1-norm  bound.  □ 

Suppose  the  circuit  applied  after  t  steps  is  W, .  Consider  Ew,  I  Yw,  (p)  \  for  any  p  with 
d  non-zeroes.  Since  the  state  p  is  physical,  tr  p2  <  1,  so  ^  Yq(p)  <  1.  Now,  in  each 
step,  if  any  site  is  chosen  where  p  is  non-zero,  this  term  decays  by  a  factor  (1  —  A). 
This  occurs  with  probability  1  —  >  d/n,  the  probability  of  choosing  a  pair 

where  at  least  one  site  is  non-zero.  Therefore 


E\yw,(p)\  <  ((1  -  A )d/n  +  (1  -  d/n))  \yw,-i(p)\, 


where  the  expectation  is  over  the  circuit  applied  at  step  t.  If  we  iterate  this  t  times  we 
find 


Ew\yw(p)\  <  exp(—Atd/n)\yo(p)\, 

where  the  expectation  here  is  over  all  random  circuits  for  the  t  steps.  We  now  sum  over 
all  p: 


^Ew\yw(p)\  <^exp(-Atd/n)  ^  \yo{p)\, 

p^O  d=  1  d(p)=d 

where  d{p)  is  the  number  of  non-zeroes  in  p.  For  the  1-norm  bound,  we  can  simply 
bound  \yo(p)\  <  1  to  give  Xd(P)=d  \Yo(p)\  <  G)3rf  so 

y^Ew\yw(p)\  <  (1  +  3 exp(— At / n))u  -  1, 
pA  o 

where  we  have  used  the  binomial  theorem.  Now  let  t  =  A  In  El.  This  gives 

^Ewlm(F)l  <  (1  +€/n)n  -  1  =  0(e). 
pA  o 
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For  the  2-norm  bound, 


exp(— 2Atd/ n)]/Q  (p) 

pA o  o 


n 

=  ^ exp(-2 At d/n)  ^  Yo(p) 

d=  1  d(p)=d 

n 

<  ^^exp(— 2A td/n) 
d= l 

exp(— 2A  t /n) 

~  1  —  exp(— 2A  t/n)' 


where  we  have  used  Yq(p)  A  1-  We  find  after  ^  In  1/e  steps  that 

y](Ew\yw(p)\)2  <  - - 

'  1  —  e 

p/o 


4.2.  Second  moments  convergence.  Firstly,  the  crpi  ®  op2  terms  for  p\  ^  p2  decay  in 
a  similar  way  to  the  non-identity  terms  in  the  1 -design  analysis.  In  fact,  the  proof  of 
Lemma  4. 1  carries  over  almost  identically  to  this  case  to  give 

Lemma  4.2.  After  0(n  log  1/e)  steps 

y.  0&w\Yw(PU  P2)\)2  <  € 
pxApi 

and  after  0(n(n  +  log  1/e))  steps 

y,  ^w\Yw(Pl,  P2)\  <  €• 

PI  /P2 

Proof  Instead  of  the  number  of  zeroes  governing  the  decay  rate,  we  need  to  count  the 
number  of  places  where  p\  and  pj  differ.  This  gives 

E\yw,(pi,  P2)\  <  ((1  -  a )d/n  +  (1  -  d/ri)) \yw,-i(pu  Pi) I, 

where  now  d  is  the  number  of  differing  sites.  There  are  Q) \2d4n~d  states  that  differ  in 
d  places  so  we  find 

X  ®w\Yw(Pi,  P2)\  <  4"[(1  +  3exp(— A t/n))n  -  1], 
p\Ap2 

Set  t  =  ^(n  In  4  +  In  1/e)  to  make  this  0(e).  The  2-norm  bound  follows  in  the  same 
way  as  for  Lemma  4.1.  □ 
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We  now  need  to  prove  the  yip,  p)  terms  converge  quickly.  We  have  seen  above  that 
the  sum  of  the  terms  yip,  p)  is  conserved  and,  for  the  purposes  of  proving  Lemma  2.1, 
we  assume  the  sum  is  1  and  yip,  p)  >  0  for  all  p. 

To  illustrate  the  evolution,  consider  the  simplest  case  when  the  gates  are  chosen  from 
U (4).  We  have  evaluated  G  in  Sect.  3.2  for  k  =  2  for  this  case.  Translated  into  coeffi¬ 
cients  this  yields  the  following  update  rule,  where  we  have  written  it  for  the  case  when 
qubits  1  and  2  are  chosen: 

Yt+ 1  Oh,  r2,r3,...,rn,Si,s2,S3,...,  sn) 

0  Or,  r2)  #  (5i,  s2) 

_  Yt( 0,  0,  r3,  . . . ,  rn,  0,  0,  53,  ... ,  s„)  (r\,r2)  =  (si,  s2)  =  (0,  0) 

13  X  r\.r'2  Yt(r[,r'2,  r3,  . . . ,  rn,  r[,r2 ,  S3,...,  sn)  (n,  r2)  =  (si,  $2)  #  (0,  0). 


The  key  idea  of  Oliveira  et  al.  [26]  was  to  map  the  evolution  of  the  yip,  p)  coefficients  to 
a  Markov  chain.  We  can  apply  this  here  to  get,  on  state  space  [0,  1,2,  3}" ,  the  evolution: 

1 .  Choose  a  pair  of  sites  uniformly  at  random. 

2.  If  the  state  is  00  it  remains  00. 

3.  Otherwise,  choose  the  state  uniformly  at  random  from  [0,  1 ,  2,  3}2\{00}. 

This  is  the  correct  evolution  since,  if  the  initial  state  is  distributed  according  to  yt(q,  q), 
the  final  state  is  distributed  according  to  yt+i(p,  p). 

The  evolution  for  other  gate  sets  will  be  similar,  but  the  states  will  not  be  chosen 
uniformly  randomly  in  the  third  step.  However,  the  state  00  will  remain  00  and  the 
stationary  distribution  on  the  other  15  states  is  the  same.  We  will  find  the  convergence 
times  for  general  gate  sets  and  then  consider  the  U  (4)  gate  set  since  we  can  perform  a 
tight  analysis  for  this  case. 


4.3.  Markov  chain  analysis.  Before  finding  the  convergence  rate  for  our  problem,  we 
will  briefly  introduce  the  basics  of  Markov  chain  mixing  time  analysis.  All  of  these 
standard  results  can  be  found  in  Ref.  [25]  and  references  therein. 

A  process  is  Markov  if  the  evolution  only  depends  on  the  current  state  rather  than  the 
full  state  history.  Therefore  the  evolution  of  the  state  can  be  thought  of  as  a  matrix,  the 
transition  matrix,  acting  on  a  vector  which  represents  the  current  distribution.  We  will 
only  be  interested  in  discrete  time  processes  so  the  state  after  t  steps  is  given  by  the  tlh 
power  of  the  transition  matrix  acting  on  the  initial  distribution. 

We  say  a  Markov  chain  is  irreducible  if  it  is  possible  to  get  from  one  state  to  any 
other  state  in  some  number  of  steps.  Further,  a  chain  is  aperiodic  if  it  does  not  return  to 
a  state  at  regular  intervals.  If  a  chain  is  both  irreducible  and  aperiodic  then  it  is  said  to  be 
ergodic.  A  well  known  result  of  Markov  chain  theory  is  that  all  ergodic  chains  converge 
to  a  unique  stationary  distribution.  In  matrix  language  this  says  that  the  transition  matrix 
P  has  eigenvalue  1  with  no  multiplicity  and  all  other  eigenvalues  have  absolute  value 
strictly  less  than  1.  We  will  also  need  the  notion  of  reversibility.  A  Markov  chain  is 
reversible  if  the  time  reversed  chain  has  the  same  transition  matrix,  with  respect  to  some 
distribution.  This  condition  is  also  known  as  detailed  balance : 


7 t(x)P(x,  y)  =  7t(y)P(y,  x). 


(4.8) 
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It  can  be  shown  that  a  reversible  ergodic  Markov  chain  is  only  reversible  with  respect 
to  the  stationary  distribution.  So  above  n(x)  is  the  stationary  distribution  of  P.  An 
immediate  consequence  of  this  is  that  for  a  chain  with  uniform  stationary  distribution, 
it  is  reversible  if  and  only  if  it  is  symmetric  (i.e.  P(x,  y)  =  P(y,  x)).  Note  also  that 
reversible  chains  have  real  eigenvalues,  since  they  are  similar  to  the  symmetric  matrix 


With  these  definitions  and  concepts,  we  can  now  ask  how  quickly  the  Markov  chain 
converges  to  the  stationary  distribution.  This  is  normally  defined  in  terms  of  the  1-norm 
mixing  time.  We  use  (half  the)  1-norm  distance  to  measure  distances  between  distribu¬ 
tions: 


iis-^ii  =  ^iis-?iii  =  ^2>-f'i-  <4-9) 

i 

We  assume  all  distributions  are  normalised  so  then  0  <  \\s-t\\  <  1 .  We  can  now  define 
the  mixing  time: 

Definition  4.1.  Let  tt  be  the  stationary  distribution  of  P.  Then  if  P  is  ergodic  the  mixing 
time  r  is 


r(c)  =  maxmin[t  >  0  :  I \P(s  —  tt  1 1  <  <?}.  (4.10) 

St  11  11 

We  will  also  use  the  (weaker)  2-norm  mixing  time  (note  this  is  not  the  same  as  t2  in 
Ref.  [25]): 

Definition  4.2.  Let  n  be  the  stationary  distribution  of  P.  Then  ifP  is  ergodic  the  2-norm 
mixing  time  is 

t2(e)  =  maxmin[t  >  0  :  II  Pls  —  tt  I L  <  e}.  (4.11) 

St  M/ 

Unless  otherwise  stated,  when  we  say  mixing  time  we  are  referring  to  the  1-norm  mixing 
time. 

There  are  many  techniques  for  bounding  the  mixing  time,  including  finding  the  sec¬ 
ond  largest  eigenvalue  of  P.  This  gives  a  good  measure  of  the  mixing  time  because 
components  parallel  to  the  second  largest  eigenvector  decay  the  slowest.  We  have  (for 
reversible  ergodic  chains) 

Theorem  4.1  (see  Ref.  [25],  Corollary  1.15). 

1  1 

t(0  <  -r-ln - > 

A  7 r*e 

where  tt *  =  mimr(x)  and  A  =  min(l  —  X2,  1  +  Knin),  where  A.  2  is  the  second  largest 
eigenvalue  and  is  the  smallest.  A  is  known  as  the  gap. 

If  the  chain  is  irreversible,  it  may  not  even  have  real  eigenvalues.  However,  we  can 
bound  the  mixing  time  in  terms  of  the  eigenvalues  of  the  reversible  matrix  PP*,  where 
P*(x,  y)  =  P{y,  x).  In  this  case  we  have  ([25],  Corollary  1.14) 

2  1 

z(e)  <  - - In - , 

A  pp*  TT*C 


(4.12) 
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where  now  App*  is  the  gap  of  the  chain  PP*.  Note  that  for  a  reversible  chain  P  =  P* 
and  App*  &  2 A,  so  the  bounds  are  approximately  the  same. 

This  can  also  be  converted  into  a  2-norm  mixing  time  bound: 

t2(e)<  — — —  In  1/e.  (4.13) 

A  pp* 

To  bound  the  gap,  we  will  use  the  comparison  theorem  in  Theorem  4.2  below.  In  this 
theorem,  we  are  thinking  of  the  Markov  chain  as  a  directed  graph  where  the  vertices 
are  the  states  and  there  are  edges  for  allowed  transitions  (i.e.  transitions  with  non-zero 
probability).  For  irreducible  chains,  it  is  possible  to  make  a  path  from  any  vertex  to  any 
other;  we  call  the  path  length  the  number  of  transitions  in  such  a  path  (which  will  in 
general  depend  on  the  choice  of  path). 

Theorem  4.2  (see  Ref.  [25],  Theorem  2.14).  Let  P  and  P  be  two  Markov  chains  on  the 
same  state  space  £2  with  the  same  stationary  distribution  n.  Then,  for  every 
with  P{x,y)  >  0  define  a  directed  path  yxy  from  x  to  y  along  edges  in  P  and  let  its 
length  be  \yxy  |.  Let  T  be  the  set  of  all  such  paths.  Then 

A  >  A/A 


for  the  gaps  A  and  A  where 
A  =  A(T)  =  max 


1 


a^b,P(a,b)^o  7 r(a)P(a,  b ) 


n{x)P{x,y)\yXy\. 


x^y:(a,b)eyXy 


For  example,  when  comparing  1 -dimensional  random  walks  there  is  no  choice  in  the 
paths;  they  must  pass  through  every  point  between  x  and  y.  Further,  the  walk  can  only 
progress  one  step  at  a  time  so  (without  loss  of  generality,  for  reversible  chains)  let 
b  =  a  +  1  to  give 


A  = 


max 

a 


1 

Tt(a)P(a,  a  +  1) 


ZZ  n{x)P(x,  y)(y  -  x) 

X<a y>fl+l 


=  max 

a 


P(a,  a  +  1) 
P(a,  a  +  1) 


(4.14) 


A  generalisation  of  the  comparison  theorem  involves  constructing  flows,  which  are 
weighted  sets  of  paths  between  states.  This  can  give  a  tighter  bound  since  bottlenecks 
are  averaged  over.  This  gives  a  modified  comparison  theorem: 


Theorem  4.3  ([12],  Theorem  2.3).  Let  P  and  P  be  two  Markov  chains  on  the  same 
state  space  £2  with  the  same  stationary  distribution  n.  Then,  for  every  x  f  y  e  £2  with 
P  (x ,  v )  >  0,  construct  a  set  of  directed  paths  Vxy  from  x  to  y  along  edges  in  P.  We 
define  the  flow  function  f  which  maps  each  path  yxy  e  Vxy  to  a  real  number  in  the 
interval  [0,  1]  such  that 


X  f(Yxy)  =  P(x,y). 

Yxy  M-’xy 
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Again,  let  the  length  of  each  path  be  \yxy\.  Then 


A  >  A/A 


for  the  gaps  A  and  A  where 

1 


X  tt(x)f(yXy)\yXy\.  (4.15) 


a^b,P(a,b)  ^0  n(a)P(a,  b ) 


*  A  y>Yxy  e  Vxy:{a,b)  e  yxy 


Note  that  we  recover  the  comparison  theorem  when  there  is  just  one  path  between  each 
x  and  y. 

4.3.1.  log-Sobolev  constant  We  will  need  tighter,  but  more  complicated,  mixing  time 
results  to  prove  the  tight  result  for  the  U  (4)  case.  We  use  the  log-Sobolev  constant: 

Definition  4.3.  The  log-Sobolev  constant  p  of  a  chain  with  transition  matrix  P  and 
stationary  distribution  it  is 


Z.t^C/OO  -  f(y))2P(x,  y)n(y) 


P  =  min 


The  mixing  time  result  is: 

Lemma  4.3  (see  Ref.  [13],  Theorem  3.7’).  The  mixing  time  of  a  finite,  reversible,  irre¬ 
ducible  Markov  chain  is 


(4.16) 


where  p  is  the  Sobolev  constant,  7r*  is  the  smallest  value  of  the  stationary  distribution, 
A  is  the  gap  and  d  is  the  size  of  the  state  space. 

Further,  the  comparison  theorem  (Theorem  4.2)  works  just  the  same  to  give 


P  >  P/A. 


We  will  need  one  more  result,  due  to  Diaconis  and  Saloff-Coste: 

Lemma  4.4  ([13],  Lemma  3.2).  Let  Pj,  i  =  l, ...  ,d,  be  Markov  chains  with  gaps  A; 
and  Sobolev  constants  p[.  Now  construct  the  product  chain  P.  This  chain  has  state  space 
equal  to  the  product  of  the  spaces  for  the  chains  Pi  and  at  each  step  one  of  the  chains  is 
chosen  at  random  and  run  for  one  step.  Then  P  has  spectral  gap  given  by: 


1 

A  =  —  min  A; 
d  i 


and  Sobolev  constant: 


1 

P  =  -  min  pi . 
d  i 
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4.4.  Convergence  proof.  We  now  prove  the  Markov  chain  convergence  results  to  show 
that  the  y(p,  p)  terms  converge  quickly.  We  have  already  shown  that  the  y(p\,  pf) 
terms  with  p  \  f  pj  converge  quickly  and  that  there  is  no  mixing  between  these  terms 
and  the  y(p,  p)  terms.  Therefore,  in  this  section,  we  remove  such  terms  from  G. 

We  want  to  prove  the  Markov  chain  with  transition  matrix  (Eq.  4.2) 


P  = 


1 

n(n  —  1) 


Z6W) 

i^j 


converges  quickly.  Firstly,  we  know  from  Sect.  3.3  that  P  has  two  eigenvectors  with 
eigenvalue  1 .  The  first  is  the  identity  state  (cro  ®  &o)  and  the  second  is  the  uniform  sum 
of  all  non-identity  terms  (^rry  °p  ®  °p)-  From  now  on,  we  remove  the  identity 

state.  This  makes  the  chain  irreducible.  Since  we  know  it  converges,  it  must  be  aperiodic 
also  so  the  chain  is  ergodic  and  all  other  eigenvalues  are  strictly  between  1  and  —  1 . 

We  show  here  that  the  gap  of  this  chain,  up  to  constants,  does  not  depend  on  the 
choice  of  2-copy  gapped  gate  set.  In  the  second  half  of  the  paper  we  find  a  tight  bound 
on  the  gap  for  the  U  (4)  case  which  consequently  gives  a  tight  bound  on  the  gap  for  all 
universal  sets. 

Since  the  stationary  distribution  is  uniform,  the  chain  is  reversible  if  and  only  if  P  is 
a  symmetric  matrix.  A  sufficient  condition  for  P  to  be  symmetric  is  for  GiIJ  )  to  be  sym¬ 
metric.  We  saw  in  Theorem  3.1  that  for  the  U (4)  gate  set  case  G(lJ>  is  symmetric.  In  fact, 
the  proof  works  identically  to  show  that  is  symmetric  for  any  gate  set,  provided 
the  set  is  invariant  under  Hermitian  conjugation.  However,  2-copy  gapped  gate  sets  do 
not  necessarily  have  this  property  so  the  Markov  chain  is  not  necessarily  reversible.  We 
will  find  equal  bounds  (up  to  constants)  for  the  gaps  of  both  P  (if  G  is  symmetric)  and 
PP*  (if  G  is  not  symmetric)  below: 


Theorem  4.4.  Let  /i  be  any  2 -copy  gapped  distribution  of  gates.  If  /i  is  invariant  under 
Hermitian  conjugation  then  let  Ap  be  the  eigenvalue  gap  of  the  resulting  Markov  chain 
matrix  P.  Then 


Ap  =  £2(Aj/(4)),  (4.17) 

where  Au^)  is  the  eigenvalue  gap  of  the  U (4)  chain.  If  p.  is  not  invariant  under 
Hermitian  conjugation,  then  let  App*  be  the  eigenvalue  gap  of  the  resulting  Markov 
chain  matrix  PP*.  Then 


App*  =  ( A  (/  (4) ) .  (4.18) 

Proof.  We  will  use  the  comparison  method  with  flows  (Theorem  4.3).  Firstly  consider 
the  case  where  /x  is  closed  under  Hermitian  conjugation,  i.e.  G  is  symmetric. 

We  will  compare  P  to  the  U  (4)  chain,  which  we  call  Pu(4)-  Recall  that  this  chain 
chooses  a  pair  at  random  and  does  nothing  if  the  pair  is  00  and  chooses  a  random  state 
from  {0,  1,  2,  3}2\{00}  otherwise. 

To  apply  Theorem  4.3,  we  need  to  construct  the  flows  between  transitions  in  Pu(4)- 
We  will  choose  paths  such  that  only  one  pair  is  modified  throughout.  For  example  (with 
n  =  4),  the  transition  1000  2000  is  allowed  in  Pu{4)-  To  construct  a  path  in  P,  we 

need  to  find  allowed  transitions  between  these  two  paths  in  P .  G  may  not  include  the 
transition  10  ->  20  directly,  however,  G  is  irreducible  on  this  subspace  of  just  two  pairs. 
This  means  that  a  path  exists  and  can  be  of  maximum  length  14  if  it  has  to  cycle  through 
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all  intermediate  states  (in  fact,  since  G  is  symmetric  the  maximum  path  length  is  8;  all 
that  is  important  here  is  that  it  is  constant).  For  example,  the  transitions  10  — ^  1 1  — >  20 
might  be  allowed.  Then  we  could  choose  the  full  path  to  be  1000  — ►  1 100  — >  2000.  In 
this  case  we  have  chosen  the  path  to  involve  transition  pairing  sites  1  and  2.  However, 
we  could  equally  well  have  chosen  any  pairing;  we  could  pair  the  first  site  with  any  of 
the  others.  We  can  choose  3  paths  in  this  way.  For  this  example,  the  flow  we  want  to 
choose  will  be  all  3  of  these  paths  equally  weighted.  We  now  use  this  idea  to  construct 
flows  between  all  transitions  in  Pu(4)  to  prove  the  result. 

Let  r  ^  j  e  2  and  let  d(x,  y )  be  the  Hamming  distance  between  the  states 
( d(x ,  y )  gives  the  number  of  places  at  which  x  and  y  differ).  There  are  two  cases 
where  Puwix,  y)  #  0: 

1.  d(x,  y)  =  2.  Here  we  must  choose  a  unique  pairing,  specified  by  the  two  sites  that 
differ.  Make  all  transitions  in  P  using  this  pair  giving  just  one  path. 

2.  d{x ,  y )  =  1 .  For  this  case,  choose  all  possible  pairings  of  the  changing  site  that  give 
allowed  transitions  in  Pu(4)-  For  each  pairing,  construct  a  path  in  P  modifying  only 
this  pair.  If  the  differing  site  is  initially  non-zero  then  there  are  n  —  1  such  pairings; 
if  the  differing  site  is  initially  zero  then  there  are  n  —  z(x)  pairings  where  z(x)  is 
the  number  of  zeroes  in  the  state  x. 

All  the  above  paths  are  of  constant  length  since  we  have  to  (at  most)  cycle  through  all 
states  of  a  pair.  We  must  now  choose  the  weighting  f{yxy )  for  each  path  such  that 

^f(Yxy)  =  Pu(4)(x,  y),  (4.19) 

Vxy 

where  Vxy  is  the  set  of  all  paths  from  x  to  y  constructed  above.  We  choose  the  weighting 
of  each  path  to  be  uniform.  We  just  need  to  calculate  the  number  of  paths  in  Vxy  to  find 
/: 

1.  d(x,  y)  =  2.  There  is  just  one  path  so  f(yxy)  =  Pu( 4)(*,  y)  =  0(l/«2). 

2.  d(x,  y)  =  1.  If  the  differing  site  is  initially  non- zero  then  Pu(4)(x,  y)  =  0(1  /n) 

and  there  are  n  —  I  paths  so  j'(yxy)  =  ~ }  =  0(l/«2).  If  the  differing  site 

is  initially  zero  then  Pu(4)(x,  y)  =  ©  and  there  are  n  —  z(x)  paths  so 

f(Y,y)  =  =  @d/»2). 

So  for  all  paths,  /  =  0(1  /n2).  We  now  just  need  to  know  how  many  times  each  edge 
(a,  b )  in  P  is  used  to  calculate  A: 

A=  max  A(a,b),  (4.20) 

a^b,P(a,b)A  0 


where 


A(a-b)  =  -f ZZJj  Z  /O',,).  (4.21) 

xAy^YxyeVxy:(a,b)eyXy 

We  have  cancelled  the  factors  of  n(x)  because  the  stationary  distribution  is  uniform.  We 
have  also  ignored  the  lengths  of  the  paths  since  they  are  all  constant. 

To  evaluate  A(a,  b ),  we  need  to  know  how  many  paths  pass  through  each  edge  (a,  b). 
We  again  consider  the  two  possibilities  separately: 
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1.  d(a,  b)  =  2.  Suppose  a  and  b  differ  at  sites  i  and  j.  Firstly,  we  need  to  count  how 
many  transitions  from  x  to  y  in  P{j(4)  could  use  this  edge,  and  then  how  many  paths 
for  each  transition  actually  use  the  edge. 

To  find  which  x  and  y  could  use  the  edge,  note  that  x  and  y  must  differ  at  sites  i, 
j  or  both.  Furthermore,  the  values  at  the  sites  other  than  i  and  j  must  be  the  same 
as  for  a  (and  therefore  b).  There  is  a  constant  number  of  x,  y  pairs  that  satisfy  this 
condition.  Now,  for  each  x,  y  pair  satisfying  this,  paths  that  use  this  edge  must  use 
the  pairing  i,  j  for  all  transitions.  Since  in  the  paths  we  have  chosen  above  there  is 
a  unique  path  from  x  to  y  for  each  pairing,  there  is  at  most  one  path  for  each  x,  y 
pair  that  uses  edge  a,  b. 

For  d(a,  b)  =  2,  P(a,  b)  =  0(l/«2)  so  A(a,  b )  is  a  constant  for  this  case. 

2.  d(a,  b)  =  1.  Let  there  be  r  pairings  that  give  allowed  transitions  in  P  between  a 
and  b.  As  above,  each  pairing  gives  a  constant  number  of  paths.  So  the  numerator 
is  0(r/n2).  Further,  P(a,  b)  =  0(r/n2).  So  again  A(a,  b)  is  constant. 

Combining,  A  is  a  constant  so  the  result  is  proven  for  the  case  G  is  symmetric. 

We  now  turn  to  the  irreversible  case.  We  now  need  to  bound  the  gap  of  PP*  =  PPT . 
This  chain  selects  two  (possibly  overlapping)  pairs  at  random  and  applies  G  to  one  of 
them  and  GT  to  the  other.  We  can  use  the  above  exactly  by  choosing  G  to  perform  the 
transitions  above  and  GT  to  just  loop  the  states  back  to  themselves.  By  aperiodicity  (the 
greatest  common  divisor  of  loop  lengths  is  1),  we  can  always  find  constant  length  paths 
that  do  this. 

Now  we  need  to  know  the  gap  of  the  U  (4)  chain.  We  can,  by  a  simple  application  of 
the  comparison  theorem,  show  it  is  Q(\/n2).  However,  in  the  second  half  of  this  paper 
we  show  it  is  0(l/n).  This  gives  us  (using  Theorem  4.1): 

Corollary  4.1.  The  Markov  chain  P  has  mixing  time  0(n(n  +  log  1  / e  ) )  and  2-norm 
mixing  time  O  (n  log  1  / e  ) . 

We  conjecture  that  the  mixing  time  (as  well  as  Lemma  4.2)  can  be  tightened  to  ©(«  log  "  ), 
which  is  asymptotically  the  same  as  for  the  U  (4)  case: 

Conjecture  4.1.  The  second  moments  for  the  case  of  general  2-copy  gapped  distributions 
have  1-norm  mixing  time  0(n  log  ”). 

It  seems  likely  that  an  extension  of  our  techniques  in  Sect.  5  could  be  used  to  prove  this. 
Combining  the  convergence  results  we  have  proved  our  general  result  Lemma  2.1: 

Proof  (of  Lemma  2.1).  Combining  Corollary  4.1  (for  the  y(p,  p)  terms)  and  Lemma 
4.2  (for  the  y(p\,  pf),  Pi  /  P2  terms)  proves  the  result. 

We  have  now  shown  that  the  first  and  second  moments  of  random  circuits  converge 
quickly.  For  the  remainder  of  the  paper  we  prove  the  tight  bound  for  the  gap  and  mixing 
time  of  the  U  (4)  case  and  show  how  mixing  time  bounds  relate  to  the  closeness  of  the 
2-design  to  an  exact  design.  Only  for  the  U  (4)  case  is  the  matrix  G  a  projector  so  in 
this  sense  the  U  (4)  random  circuit  is  the  most  fundamental.  While  we  expect  the  above 
mixing  time  bound  is  not  tight,  we  can  prove  a  tight  mixing  time  result  for  the  U  (4) 
case.  However,  using  our  definition  of  an  approximate  /.'-design,  the  gap  rather  than  the 
mixing  time  governs  the  degree  of  approximation. 
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5.  Tight  Analysis  for  the  U  (4)  Case 

We  have  already  found  tight  bounds  for  the  first  moments  in  Lemma  4. 1 :  just  set  A  =  1 . 


5.1.  Second  moments  convergence.  We  need  to  prove  a  result  analogous  to  Lemma  4.2 
for  the  terms  opx  ®  op ,2,  where  p\  ft  pi-  We  already  have  a  tight  bound  for  the  2-norm 
decay,  by  setting  A  =  1  into  Lemma  4.2.  We  tighten  the  1-norm  bound: 

Lemma  5.1.  After  0(n  log  ")  steps 


y.  ^w\vw(pi,  Pi)\  <  e.  (5.1) 

PI  AP2 

Proof.  We  will  split  the  random  circuits  up  into  classes  depending  on  how  many  qubits 
have  been  hit.  Let  H  be  the  random  variable  giving  the  number  of  different  qubits  that 
have  been  hit.  We  can  work  out  the  distribution  of  H  and  bound  the  sum  of \yw(p\ ,  Pi)\ 
for  each  outcome. 

Firstly  we  have,  after  t  steps, 


P (H  <  h)  < 


(h/n Y. 


Now,  for  each  qubit  hit,  each  coefficient  which  has  p\  and  p2  differing  in  this  place  is 
set  to  zero.  So  after  h  have  been  hit,  there  are  only  (at  most)  1  (fn~!l)  terms  in  the  sum  in 
Eq.  5.1.  As  before,  the  state  is  a  physical  state,  tr  p1  <  1  so  SP1P2  y2  ( PUP2 )  <  1  SO 

SpiP2  Ik(Pi>  /A)  I  <  \//V  if  there  are  at  most  N  non-zero  terms  in  the  sum.  Therefore 
we  have,  after  t  steps, 


n —  1 


X  Ew\yw(Pl,  P2)\  <  yP(//  =  fi)16('!-/!)/2 

PlAP2 


h=  1 
n—  1 


<  y  F(H  <  h) 4(w' 


■h) 


h=  1 
n— 1 


< 


«-l  /  \ 

so- 


fi/n)?4/!  h 


n  —  h 


h— 
n—  1 


< 


At— 1  /  v 

/7  =  1  V  ' 


exp(— ht/n)4' . 
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Now,  let  t  =  n  In  ” : 


'  1 7^  P2  h=l  V  7  V  7  V  7  V  7 


Pl^P2 

where  the  last  line  follows  from  the  binomial  theorem.  □ 


This,  combined  with  the  mixing  time  result  we  prove  below,  completes  the  proof  that 
the  second  moments  of  the  random  circuit  converge  in  time  0(n  log  |). 


5.2.  Markov  chain  of  coefficients.  The  Markov  chain  acting  on  the  coefficients  is  reduc¬ 
ible  because  the  state  {0}"  is  isolated.  However,  if  we  remove  it  then  the  chain  becomes 
irreducible.  The  presence  of  self  loops  implies  aperiodicity,  therefore  the  chain  is  ergo- 
dic.  We  have  already  seen  that  the  chain  converges  to  the  Haar  uniform  distribution 
(in  Sect.  1.1),  therefore  the  stationary  state  is  the  uniform  state  jt(x)  =  1/(4"  —  1). 
Further,  since  the  chain  is  symmetric  and  has  uniform  stationary  distribution,  the  chain 
satisfies  detailed  balance  (Eq.  4.8)  so  is  reversible.  We  now  turn  to  obtaining  bounds  on 
the  mixing  time  of  this  chain. 

We  want  to  show  that  the  full  chain  converges  to  stationarity  in  time  (-)(n  log  ").  This 
implies  (see  later)  that  the  gap  is  0(1  /n).  To  prove  this,  we  will  construct  another  chain 
called  the  zero  chain.  This  is  the  chain  that  counts  the  number  of  zeroes  in  the  state. 
Since  it  is  the  zeroes  that  slow  down  the  mixing,  this  chain  will  accurately  describe  the 
mixing  time  of  the  full  chain. 


Lemma  5.2.  The  zero  chain  has  transition  matrix  P  on  state  space  (we  count  non-zero 
positions)  Q.  =  {1,2 ,...,«}. 


P(x,y) 


i  2x(3n— 2x—  1) 

1  5n(n  — 1) 

2x(x  —  1) 

5  n  (n— 1) 

6  x(n—x) 

5n(n  —  l) 

0 


y  =  x 

y  =  x  —  1 
y  =  x  +  1 
otherwise 


(5.2) 


for  1  <  x,  y  <  n. 

Proof  Suppose  there  are  n  —  x  zeroes  (so  there  are  x  non-zeroes).  Then  the  only  way 
the  number  of  zeroes  can  decrease  (i.e.  for  x  to  increase)  is  if  a  non-zero  item  is  paired 
with  a  zero  item  and  one  of  the  9  (out  of  15)  new  states  is  chosen  with  no  zeroes.  The 
probability  of  choosing  such  a  pair  is  2,^''Tf)>  so  the  overall  probability  is  yt  . 

The  number  of  zeroes  can  increase  only  if  a  pair  of  non-zero  items  is  chosen  and  one 
of  the  6  states  is  chosen  with  one  zero.  The  probability  of  this  occurring  is  -jj  • 

The  probability  of  the  number  of  zeroes  remaining  unchanged  is  simply  calculated 
by  requiring  the  probabilities  to  sum  to  1 . 

We  see  that  the  zero  chain  is  a  one-dimensional  random  walk  on  the  line.  It  is  a  lazy 
random  walk  because  the  probability  of  moving  at  each  step  is  <  1 .  However,  as  the 
number  of  zeroes  decreases,  the  probability  of  moving  increases  mono  tonic  ally: 

2x(3  n  —  2x  —  1) 

1  -P(x,x)  =  — - -  >  2x/5n  <  1.  (5.3) 

5  n(n  —  1) 

□ 
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Lemma  5.3.  The  stationary  distribution  of  the  zero  chain  is 

3x(n) 

*«(*>  =  *rzr-  (5-4) 

Proof.  This  can  be  proven  by  multiplying  the  transition  matrix  in  Lemma  5.2  by  the 
state  Eq.  5.4.  Alternatively,  it  can  be  proven  by  counting  the  number  of  states  with  n  —  x 
zeroes.  There  are  (”)  ways  of  choosing  which  sites  to  make  non-zero  and  each  non-zero 
site  can  be  one  of  three  possibilities:  1,  2  or  3.  The  total  number  of  states  is  4"  —  1,  which 
gives  the  result.  □ 

Below  we  will  prove  the  following  theorem: 

Theorem  5.1.  The  zero  chain  mixes  in  time  0  (n  log  -). 

The  2-norm  mixing  time  follows  easily: 


Theorem  5.2.  The  zero  chain  has  2-norm  mixing  time  0  (n  log  1/e). 


Proof  We  use  a  lower  bound  on  the  1-norm  mixing  time  to  show  that  the  gap  of  the 
zero  chain  is  £2(1  /n)  and  then  use  the  2-norm  mixing  bound  Eq.  4.13.  In  [25],  Theorem 
4.9,  they  prove  the  lower  bound: 


Tl(0  > 


(5.5) 


where  A  is  the  eigenvalue  gap.  In  Theorem  5.1,  we  showed  t\  (e)  <  Cn  In  |  for  some 
constant  C.  Combining, 


1  —  A  1  n 

- In  —  <  Cn  In  — 

A  2c  e 

for  all  c  >  0.  Divide  by  In  1/e  and  take  the  limit  e  — ►  0  to  find 

1  -  A 

-  <  Cn 


(5.6) 


(5.7) 


which  implies  the  gap  is  £2  (l/«).  The  2-norm  bound  now  follows  from  Eq.  4.13.  □ 


Before  proving  Theorem  5 . 1 ,  we  will  show  how  the  mixing  time  of  the  full  chain  follows 
from  this. 


Corollary  5.1.  The  full  chain  mixes  in  time  0(n  log  ”). 

Proof  Once  the  zero  chain  has  approximately  mixed,  the  distribution  of  zeroes  is  almost 
correct.  We  need  to  prove  that  the  distribution  of  non-zeroes  is  correct  after  0(n  log  |) 
steps  too. 

Once  each  site  of  the  full  chain  has  been  hit,  meaning  it  is  chosen  and  paired  with 
another  site  so  not  both  equal  zero,  the  chain  has  mixed.  This  is  because,  after  each 
site  has  been  hit,  the  probability  distribution  over  the  states  is  uniform.  When  the  zero 
chain  has  approximately  mixed,  a  constant  fraction  of  sites  are  zero  so  the  probability 
of  hitting  a  site  at  each  step  is  0(1/ n ) .  By  the  coupon  collector  argument,  each  site  will 
have  been  hit  with  probability  at  least  1  —  e  in  time  O  (n  log  "  ).  Once  the  zero  chain  has 
mixed  to  C ,  we  can  run  the  full  chain  this  extra  number  of  steps  to  ensure  each  site  has 
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been  hit  with  high  probability.  Since  the  mixing  of  the  zero  chain  only  increases  with 
time,  the  distance  to  stationarity  of  the  full  chain  is  now  1  —  e  —  e'.  We  make  this  formal 
below. 

After  to  =  0{n  log  4)  steps,  the  number  of  zeroes  is  e'-close  to  the  stationary  distri¬ 
bution  tzq  by  Theorem  5.1  and  only  gets  closer  with  more  steps  since  the  distance  to  sta¬ 
tionarity  decreases  monotonically.  The  stationary  distribution  Eq.  5.4  is  approximately 
a  Gaussian  peaked  at  3n/4  with  0(n)  variance.  This  means  that,  with  high  probability, 
the  number  of  non-zeroes  is  close  to  3 n  /4.  We  will  in  fact  only  need  that  there  is  at  least 
a  constant  fraction  of  non-zeroes;  with  probability  at  least  1  —  el  —  exp (-Q(n))  there 
will  be  at  least  n/2. 

To  prove  the  mixing  time,  we  run  the  chain  for  time  to  so  the  zero  chain  mixes  to 
c'.  Then  run  for  t\  additional  steps.  Let  Hl  t  be  the  event  that  site  i  is  hit  at  step  t.  Let 
Hi  =  Hl  t  and  H  =  We  want  to  show  P (H)  is  close  to  1,  or,  in  other 

words,  that  all  sites  are  hit  with  high  probability.  Lurther  let  Xt  be  the  random  variable 
giving  the  number  of  non-zeroes  at  step  t. 

If  at  step  t  —  1  site  i  is  non-zero  then  the  event  Hl  t  occurs  if  the  qubit  is  chosen, 
which  occurs  with  probability  2 /n.  If,  however,  it  was  zero  then  it  must  be  paired  with 
a  non- zero  thing  for  Hj  t  to  hold.  Conditioned  on  any  history  with  Xt~\  >  n/2,  this 
probability  is  >  1/n.  In  particular,  we  can  condition  on  not  having  previously  hit  i  and 
the  bound  does  not  change.  Combining  we  have 


H; 


l.t 


t~  1 


[x,-,  >  n/2]  p|  (  P  Hi,,  )  )  <  1  -  1/n. 

f'=f0  +  l 


Then,  after  t\  extra  steps, 


(to+h-i  \ 

Hf  P  [X,  >n/2]l 

t=to  / 


[Xt>n/ 2}  <  (1-  1/n)'1, 


which,  using  the  union  bound,  gives 

to+ti-i 


(to+n-i  \ 

Hc  p|  [X,  >  n/2]  ) 

t=t0  / 


[Xt  >  n/2]  I  <  n(l  —  1/n)' 


Now,  since  the  zero  chain  has  mixed  to  e', 

(tQ+ti- 1  \  n  1 

P  [Xt>n/ 2])<fi  ^ 
t=to  J  x=n/2 


7to(x)  +  e'  <  t\  exp ( —  O  (n ) )  +  er , 


so 


P (Hc)  <  n(  1  -  l/n)?1  +  t\  exp(-0(n))  +  e'. 

Now,  choose  t\  —  n  In  ^  so  that  P (Hc)  <  8,  where  8  =  e  +  t\  exp(—  0{n)).  Choose 
e  =  1/n  so  that  <5  is  1/  poly(n).  Now,  using  the  bound  on  P (Hc),  we  can  write  the  state 
v  after  t\  =  0(n  logn)  steps  as 


v  —  (1  —  8)jt  +  8tt'  , 
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where  tt  is  the  stationary  distribution  and  n'  is  any  other  distribution.  Using  this, 

||u  —  7T  1 1  <  5. 

We  now  apply  Lemma  A.  14  to  show  that  after  0{n  log  ”)  steps  the  distance  to  stationa- 
rity  of  the  full  chain  is  e.  □ 


5.3.  Proof  of  Theorem  5.1.  We  will  now  proceed  to  prove  Theorem  5.1.  We  present  an 
outline  of  the  proof  here;  the  details  are  in  Sect.  A. 2. 

Firstly,  note  that  by  the  coupon  collector  argument,  the  lower  bound  on  the  time  is 
<2  (n  log  n).  We  need  to  prove  an  upper  bound  equal  to  this.  Intuition  says  that  the  mixing 
time  should  take  time  O  (n  log  n)  because  the  walk  has  to  move  a  distance  0 («)  and  the 
waiting  time  at  each  step  is  proportional  to  n,  n/2,  n/3, . . .  which  sums  to  0(n  log  n), 
provided  each  site  is  not  hit  too  often.  We  will  show  that  this  intuition  is  correct  using 
the  Chemoff  bound  and  log-Sobolev  (see  later)  arguments. 

We  will  first  work  out  concentration  results  of  the  position  after  some  number  of 
accelerated  steps.  The  zero  chain  has  some  probability  of  staying  still  at  each  step. 
The  accelerated  chain  is  the  zero  chain  conditioned  on  moving  at  each  step.  We  define 
the  accelerated  chain  by  its  transition  matrix: 

Definition  5.1.  The  transition  matrix  for  the  accelerated  chain  is 


Pa  (x,  y) 


x-l 


3n—2x—  I 
3  (n—x) 
3n—2x  —  l 

0 


y  =  x 

y  =  x  —  1 
y  =  x  +  1 
otherwise 


(5.8) 


We  use  the  accelerated  chain  in  the  proof  to  firstly  prove  the  accelerated  chain  mixes 
quickly,  then  to  bound  the  waiting  time  at  each  step  to  obtain  a  mixing  time  bound  for 
the  zero  chain. 

To  prove  the  mixing  time  bound,  we  will  split  the  walk  up  into  three  phases.  We  will 
split  the  state  space  into  three  (slightly  overlapping)  parts  and  the  phase  can  begin  at 
any  point  within  that  space.  So  each  phase  has  a  state  space  £2,-  c  [1  ,n\,  an  entry  space 
Ej  c  £2,  and  an  exit  condition  7) .  We  say  that  a  phase  completes  successfully  if  the  exit 
condition  is  satisfied  in  time  O  (n  log  n)  for  an  initial  state  within  the  entry  space.  When 
the  exit  condition  is  satisfied,  the  walk  moves  onto  the  next  phase. 

The  phases  are: 

1.  £2i  =  [1  ,ns]  for  some  constant  <5  with  0  <  8  <  1/2.  E\  =  £2i  (i.e.  it  can  start 
anywhere)  and  T\  is  satisfied  when  the  walk  reaches  ns .  For  this  part,  the  probability 
of  moving  backwards  (gaining  zeroes)  is  0(ns  1 )  so  the  walk  progresses  forwards 
at  each  step  with  high  probability.  This  is  proven  in  Lemma  A. 8.  We  show  that  the 
waiting  time  is  O {n  log  n)  in  Lemma  A.9. 

2.  £2t  =  [ns / 2,  On]  for  some  constant  0  with  0  <  6  <  3/4.  E2  =  [ns ,  On]  and  Tj  is 
satisfied  when  the  walk  reaches  On .  Here  the  walk  can  move  both  ways  with  constant 
probability  but  there  is  a  £2  (1)  forward  bias.  Here  we  use  a  monotonicity  argument: 
the  probability  of  moving  forward  at  each  step  is 
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p(x) 

- 


30  —  x) 

3  n  —  2x  —  1 

30  —  x) 

3  n  —  2x 


>  3(1  -6) 
~  3-26  ' 


If  we  model  this  random  walk  as  a  walk  with  constant  bias  equal  to  '"I1  we  will 
find  an  upper  bound  on  the  mixing  time  since  mixing  time  increases  monotonically 
with  decreasing  bias.  Further,  the  waiting  time  at  x  =  a  stochastically  dominates 
the  waiting  time  at  x  =  b  for  b  >  a.  The  true  bias  decreases  with  position  so  the 
walk  with  constant  bias  spends  more  time  at  the  early  steps.  Thus  the  position  of  this 
simplified  walk  is  stochastically  dominated  by  the  position  of  the  real  walk  while 
the  waiting  time  stochastically  dominates  the  waiting  time  of  the  real  walk. 

3.  £2  3  =  [fn,  n]  and  £3  =  [On,  «].  73  is  satisfied  when  this  restricted  part  of  the  chain 

has  mixed  to  distance  e.  Here  the  bias  decreases  to  zero  as  the  walk  approaches  3n/4 
but  the  moving  probability  is  a  constant.  We  show  that  this  walk  mixes  quickly  by 
bounding  the  log-Sobolev  constant  of  the  chain. 

Showing  these  three  phases  complete  successfully  will  give  a  mixing  time  bound  for  the 
whole  chain. 

We  now  prove  in  the  Appendix  that  the  phases  complete  successfully  with  probability 
at  least  1  —  1/  poly(n): 


Lemma  5.4. 


P(Phase  1  completes  successfully)  >  1  —  n2S  1  —  2 n  8 . 


Lemma  5.5. 


P(Phase  2  completes  successfully)  >  1  —  exp  (  —~[i6n  )  —  (  — 


(-pen)  “  {l) 


3_ 

2n 


-  (q/pfS/2 , 


2  exp 

1  —  exp(— fi/2) 
where  /x  =  ~  1- 

Lemma  5.6. 


P  (Phase  3  completes  successfully)  >  1 
We  can  now  finally  combine  to  prove  our  result: 


(As) 


On/ 1 


Proof  (of  Theorem  5.1).  The  stationary  distribution  has  exponentially  small  weight  in 
the  tail  with  lots  of  zeroes.  We  show  that,  provided  the  number  of  zeroes  is  within  phase 
3,  the  walk  mixes  in  time  0(n  log  |).  We  also  show  that  if  the  number  of  zeroes  is 
initially  within  phase  1  or  2,  after  0(n\ogn)  steps  the  walk  is  in  phase  3  with  high 
probability.  We  can  work  out  the  distance  to  the  stationary  distribution  as  follows. 

Let  p f  be  the  probability  of  failure.  This  is  the  sum  of  the  error  probabilities  in 
Lemmas  5.4,  5.5  and  5.6.  The  key  point  is  that  Pf  =  1/  poly(n).  Then  after  O (n  log  ") 
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steps  (the  sum  of  the  number  of  steps  in  the  3  phases),  the  state  is  equal  to  (1  —  p  f)v  3  + 
p  fv',  where  v3  is  the  state  in  the  phase  3  space  and  v'  is  any  other  distribution,  which 
occurs  if  any  one  of  the  phases  fails.  Since  the  distance  to  stationarity  in  phase  3  is  <?, 
1 1 1>3  —  713 1 1  <  e,  where  713  is  the  stationary  distribution  on  the  state  space  of  phase  3.  In 
Lemma  A.  12  we  show  that  713 (x)  =  7t(jc)/(1  —  w),  where  w  =  *  x(x)-  Since 

7t(jc)  is  exponentially  small  in  this  range,  w  is  exponentially  small  in  n.  Now  use  the 
triangle  inequality  to  find 


l|t>3  -  7T||  <  11^3  -  7T3II  +  l|7T3  -  *\\- 


(5.9) 


Since  the  chain  in  phase  3  has  mixed  to  e ,  the  first  term  is  <  e .  We  can  evaluate  1 1 7T3  —  7T 1 1 : 


11^3  -  TT  II  = 


So  now, 

11(1  -  Pf)v3+Pfv'  -  TT  1 1  =  ||(1  -  Pf)(v3  -  TT)  +  PfW  ~  7T )  1 1 

<  (1  -  Pf)\\V3  ~  7T||  +  Pf\W  ~  7T|| 

<  (1  -  Pf)(€  +  W)  +  pf 

<  8, 

where  8  =  e  +  w+p  f.  We  are  free  to  choose  e  \  choose  it  to  be  1  /n  so  that  8  is  1/  poly(n). 
So  now  the  running  time  to  get  a  distance  8  is  t  =  0(n  log  n).  We  then  apply  Lemma 
A.  14  to  obtain  the  result. 

This  concludes  the  proof  of  Theorem  5.1  so  Corollary  5.1  is  proved.  □ 

We  have  now  proven  Lemma  2.1  and  consequently  Corollary  2. 1 .  We  now  show  how 
Theorem  2.2  follows. 


\  X  ItoC*)  - 

X  =  1 

On/ 2—1  n 

TT(X)+  (7T(*)/(1  -  W)  ~  7T(X)) 

x=l  x=6n/2 

1 

-  (w  +  1  —  (1  —  w))  =  w. 


6.  Main  Result 

We  will  now  show  how  the  mixing  time  results  imply  that  we  have  an  approximate 
2-design. 

Proof  (Proof  of  Theorem  2.2).  We  will  go  via  the  2-norm  since  this  gives  a  tight  bound 
when  working  with  the  Pauli  operators.  The  supremum  can  be  taken  over  just  physical 
states  p  [29].  We  write  p  in  the  Pauli  basis  as  usual  (as  Eq.  2.3). 
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WGw  ~  Gh\\1  =  SUP  \\(Gw  ®  l)(p)  ~  ( Gh  <S>  /)(p)lli 

p 

<  24"  sup  || (Gw  ®  /)(P)  -  (£//  ®  /)(p)lll 

p 


sup 

p 


y,  Ko(Pl,  P2,  P3,  P4)(Gw(<7P]  ®  <yp2)  ®  crp3  <g)  crp4 

Pl'P2’P3'f4 

P1P2^00 


2 


-Gh(vPi  ®  ctP2)  ®  tTp3  <g>  (Tp4) 


2 


Now,  write  (for  p\p2  #  00)  <g)  erp2)  =  4  X  «i-«2  gt(q\,qr,  Pi,  P2)oqx  <g> 

9192^°° 

Oq2-  We  get 


sup 

p 


y,  yo(pi,  P2,  P3,  pa)  Igt(qi,q2\  pi,pi) 

PI  'P2'P3'P4'£ll  '92  ' 

P1P2  ^00.9192  ^0° 


&q\q2&P\P2 

2n(2n  +  1) 


xcr^j  <g)  oq2  <g)  crP3  <g>  crp4 


=  24,!  sup  2]  yo  (pi,  P2,  ps,  pa)  ( gt(qi,qr,  pi,  pi)  - 

P  PI  'P2’P3'P4'l1l  '92  ' 

PI  P2  ^°0, 9 192^00 

<  24"  sup  ^  y02(Pl .  P2,  P3,  P4)£2 


^q\qi^p\P2 

2n(2n  +  1) 


): 


P\-P2-P3’P4 

PlP2f400 


<  24ne2, 


where  the  first  equality  comes  from  the  orthogonality  of  the  Pauli  operators  under  the 
Hilbert-Schmidt  inner  product  and  the  last  inequality  comes  from  the  fact  that  p  is  a 
physical  state  so  has  tr  P2  <  1  .  This  proves  the  result  for  the  diamond  norm,  Definition 
2.5.  For  the  distance  measure  defined  in  Definition  2.6,  the  argument  in  [10]  can  be  used 
together  with  the  1-norm  bound  to  prove  the  result.  □ 

It  is  unfortunate  that  there  is  still  a  dimension  factor  remaining  in  the  above  proof. 
To  get  a  distance  e  we  have  to  run  the  random  circuit  for  0(n(n  +  log  1/e))  steps. 
However,  closeness  in  the  diamond-norm  may  be  too  stringent  a  requirement.  After 
0(n(n  +  log  1/e))  steps,  the  random  circuit  gives  a  2-design  in  the  measure  used  by 
Dankert  et  al.  (see  [10]  and  Definition  2.6).  This  is  in  contrast  to  the  0(n  log  1  /e)  steps 
required  by  the  explicit  circuit  construction  of  Dankert  et  al. 


7.  Conclusions 

We  have  proved  tight  convergence  results  for  the  first  two  moments  of  a  random  circuit. 
We  have  used  this  to  show  that  random  circuits  are  efficient  approximate  1-  and  2-unitary 
designs.  Our  framework  readily  generalises  to  ^-designs  for  any  k  and  the  next  step  in 
this  research  is  to  prove  that  random  circuits  give  approximate  ^-designs  for  all  k. 

We  have  shown  that,  provided  the  random  circuit  uses  gates  from  a  universal  gate  set 
that  is  also  universal  on  U  (4),  the  circuit  is  still  an  efficient  2-design.  We  also  see  that  the 
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random  circuit  with  gates  chosen  uniformly  from  U  (4)  is  the  most  natural  model.  We 
note  that  the  gates  from  U  (4)  can  be  replaced  by  gates  from  any  approximate  2-design 
on  two  qubits  without  any  change  to  the  asymptotic  convergence  properties. 

One  application  of  this  work  is  to  give  an  efficient  method  of  decoupling  two  quantum 
systems  by  applying  a  random  unitary  from  a  2-design  to  one  system  and  then  discarding 
part  of  it.  This  technique  is  used  in  [2]  to  construct  a  variety  of  encoding  circuits  for 
tasks  in  quantum  Shannon  theory;  thus,  we  (like  [10])  reduce  the  encoding  complexity 
in  [2]  (and  related  works,  such  as  [21])  to  0(n2).  Unfortunately,  the  decoding  circuits 
still  remain  inefficient. 

An  algorithmic  application  of  random  circuits  was  given  in  [  1 9] ,  where  they  were  used 
to  construct  a  new  class  of  superpolynomial  quantum  speedups.  In  that  paper,  random  cir¬ 
cuits  of  length  0{n 3)  were  used  in  order  to  guarantee  that  they  were 
so-called  “dispersing”  circuits.  Our  results  immediately  imply  that  circuits  of  length 
O  in2)  would  instead  suffice.  We  believe  that  this  could  be  further  improved  with  a  spec¬ 
ialised  argument,  since  [19]  assumed  that  the  input  to  the  random  circuit  was  always  a 
computational  basis  state. 

Another  potential  application  of  random  circuits  is  to  model  the  evolution  of  black 
holes  [22].  In  Ref.  [22],  they  conjecture  that  short  random  local  quantum  circuits  are 
approximately  2-designs,  and  thus  can  be  used  for  decoupling  quantum  systems  (as 
in  [2]).  This,  in  turn,  is  used  to  make  claims  about  the  rate  at  which  black  holes  leak 
information.  While  our  model  differs  from  that  of  Ref.  [22]  in  that  they  consider  near¬ 
est-neighbour  interactions  and  we  do  not,  our  techniques  and  results  could  be  readily 
extended  to  cover  the  case  they  consider. 

Finally,  random  circuits  are  interesting  physical  models  in  their  own  right.  The  orig¬ 
inal  purpose  of  [26]  was  to  answer  the  physical  question  of  how  quickly  entangle¬ 
ment  grows  in  a  system  with  random  two  party  interactions.  Lemma  2. 1  (i)  shows  that 
O (n (n  +  log  1  /e) )  steps  suffice  (in  contrast  to  O  (n2(n  +  log  1/e))  which  they  prove)  to 
give  almost  maximal  entanglement  in  such  a  system. 


Acknowledgements  We  are  grateful  for  funding  from  the  Army  Research  Office  under  grant  W91 1 1NF-05- 
1-0294,  the  European  Commission  under  Marie  Curie  grants  ASTQIT  (FP6-022194)  and  QAP  (IST-2005- 
15848),  and  the  U.K.  Engineering  and  Physical  Science  Research  Council  through  “QIP  IRC.”  We  thank 
Raphael  Clifford,  Ashley  Montanaro  and  Dan  Shepherd  for  helpful  discussions. 


A.  Appendix 

A.l.  Permutation  operators.  The  following  theorems  about  permutation  operators  will 
be  used  repeatedly. 

Lemma  A.l.  Let  C  be  a  cycle  of  length  c  in  Sc.  Then 

tr  (C  (A i  ®  A 2  ®  ®  Ac))  =  tr  (A(7d)A(yo2(i)A(-'o3Q) . . .  Ai)  . 

Proof.  We  have 

tr  (C  (Ai  <g>  A2  ®  ®  Ac))  =  ^  (hh  •  •  •  ic\C  (Ai  <g>  A2  ®  ®  Ac )  | i\i2  .  ..ic) 

i\ ,6,  ■••A 

=  ^  (o|Ac(l)|ic(l))(l2|Ac(2)|lC(2)) 

■  •  •  (u|Ac(c)|z'c(c)) 
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X  (*ilAc(i)l*C(i))(j'c(i)|Ac°  2(1)  l*’co2(l)) 


•  ■  •  (l'coc~1(l)lAll*'l) 


since  Coc(  1)  =  1.  Evaluate  the  sum  using  the  resolution  of  the  identity  to  get  the 
result.  □ 

With  this  we  can  work  out  the  Pauli  expansion  of  the  swap  operator: 

Lemma  A.2.  The  swap  operator  T  on  two  d  dimensional  systems  can  be  written  as 


p 

where  {a p}  form  a  Hermitian  orthogonal  basis  with  tr  er”  =  d. 
Proof.  Expand  T  in  the  basis  and  use  Lemma  A.  1 : 

tr  op  ®  oqT  =  tr  opoq 


d  p  =  q 
0  otherwise. 


The  given  sum  has  the  correct  coefficients  in  the  basis,  therefore  /  X/(  °p  ®  ap  =  -T.  □ 


A.  2.  Zero  chain  mixing  time  proofs. 

A.2.1.  Asymmetric  simple  random  walk  We  will  use  some  facts  about  asymmetric  sim¬ 
ple  random  walks,  i.e.  a  random  walk  on  a  ID  line  with  probability  p  of  moving  right 
at  each  step  and  probability  q  =  1  —  p  of  moving  left. 

The  position  of  the  walk  after  k  steps  is  tightly  concentrated  around  k(p  —  q): 

Lemma  A.3.  Let  Xp  be  the  random  variable  giving  the  position  of  a  random  walk  after  k 
steps  starting  at  the  origin  with  probability  p  of  moving  right  and  probability  q  =  1  —  p 
of  moving  left.  Let  p  =  p  —  q.  Then  for  any  ?/  >  0, 


and 


Proof.  The  standard  Chernoff  bound  for  0/1  variables  K,  gives,  with  YL  equal  to  1  with 


probability  p  and  for  Yp  =  Xf=i 


For  our  case,  set  Yt  =  2Xl  —  1  to  give  the  desired  result.  □ 
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This  result  is  for  a  walk  with  constant  bias.  We  will  need  a  result  for  a  walk  with 
varying  (but  bounded  from  below)  bias: 

Lemma  A.4.  Let  Xk  be  the  random  variable  giving  the  position  of  a  random  walk  after 
k  steps  starting  at  the  origin  with  probability  pi  >  p  of  moving  right  and  probability 
q  i  <  p  of  moving  left  at  step  i.  Let  p,  —  p  —  (1  —  p).  Then  for  any  q  >  0, 

¥(Xk  >  pk  +  q)  <  exp 

and 

F(Xk  <  tik-rj)  <  exp 

Proof  Let  Yj  be  a  random  variable  equal  to  1  with  probability  p  and  0  with  probability 
1  —  p.  Then  let  Zl  be  a  random  variable  equal  to  1  with  probability  p,  and  0  with 
probability  1  —  pL.  Let  Yk  =  X/=i  T;  ar|d  Zk  =  Xf=i  %i-  Then  following  the  standard 
Chemoff  bound  derivation  (for  a  >  0), 

P (Zk  >  kp  +  rj)  =  P  (eXZk  >  ek(kp+ri)^ 

eX(kp+r)) 

~  Ke^Zk 

eX(kp+r)) 

~  EeXYk 

We  can  then,  as  above,  set  Z,  =  2X,  —  1 .  The  calculation  is  similar  for  the  bound  on 
PCX’ k  <  pk  —  q).  □ 


From  Lemma  A. 3  we  can  prove  a  result  about  how  often  each  site  is  visited.  If  the 
walk  runs  for  t  steps  the  walk  is  at  position  t/x  with  high  probability  so  we  might  expect 
from  symmetry  that  each  site  will  have  been  visited  about  1  //x  times.  Below  is  a  weaker 
concentration  result  of  this  form  but  is  strong  enough  for  our  purposes.  It  says  that  the 
amount  of  time  spent  <  x  is  about  x/pt. 

Lemma  A.5.  For  y  >  2  and  integer  x  >  0, 

OO  \ 

^  l(Xk  <  x)  >  yx/pi  I  <  2  exp 

k=  1  / 

where  I  is  the  indicator  function. 

Proof  Let  Yk  =  I(Xk  <  x).  From  Lemma  A. 3, 


(  (kp,  —  x)2\ 
P(Tfe  =  0)  <  exp  l  —  v  *2k  j 
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for  k  <  x  /  n  and 


/  ( k/i  —  x)2\ 

^(Yk  =  1)  <  exp  l— — — - J 


for  k  >  x/fi. 

Then  the  quantity  to  evaluate  is 


) 


Yk  >  yx/fjL  ) . 


We  use  a  standard  trick  to  split  this  into  two  mutually  exclusive  possibilities  and  then 
bound  the  probabilities  separately.  Write 


‘) 


Yk  >  yx/ix 

Yk  >  yx/ix 


+11 

We  can  bound  the  first  term 
Yk  >  yx/\i 


Yk  >  yx/fx 


)j  Yx/n 

n  ( n  i+ = •! 

)(yxhL 

n  ( u  = °] 


(A.l) 


\\  jyxhi  ^ 

n(G"'-,1))-r(.Q'-1 


—  P  {Yyx/n  —  1) 

fix(y  -  l)2' 


<  exp 

<  exp 


^  fixjy  -  l)2^ 
^  fix(y  -  2)^ 


The  second  term  is  done  similarly: 

fyx/ii 


Yk  >  yx/fi 


n(,y 


lYi  =  0] 


U  [  Yk  =  1] 

tk=^  +  1 

<  X  p^  =  D 


k=^+ i 


< 


<  exp 


00  , 

X  exp( 

( 


(kfi  —  x) 


k=^+ 1 


2k 


:) 


/x.r(y  -  2) 


) 
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The  last  fact  we  need  about  asymmetric  simple  random  walks  is  a  bound  on  the  proba¬ 
bility  of  going  backwards.  If  p  >  q  then  we  expect  the  walk  to  go  right  in  the  majority 
of  steps.  The  probability  of  going  left  a  distance  a  is  exponentially  small  in  a.  This  is  a 
well  known  result,  often  stated  as  part  of  the  gambler’s  ruin  problem: 

Lemma  A.6  (see  e.g.  [17]).  Consider  an  asymmetric  simple  random  walk  that  starts 
at  a  >  0  and  has  an  absorbing  barrier  at  the  origin.  The  probability  that  the  walk 
eventually  absorbs  at  the  origin  is  1  if  p  <  q  and  (q / p)a  otherwise. 

This  result  is  for  infinitely  many  steps.  If  we  only  consider  finitely  many  steps,  the 
probability  of  absorption  must  be  at  most  this. 


A.2.2.  Waiting  time  From  above  we  saw  that  the  probability  of  moving  is  at  least  2x /5 n 
when  at  position  x.  The  length  of  time  spent  waiting  at  each  step  is  therefore  stochas¬ 
tically  dominated  by  a  geometric  distribution  with  parameter  2x/5n.  The  following 
concentration  result  will  be  used  to  bound  the  waiting  time  (in  our  case  /l  =  2/5): 

Lemma  A.7.  Let  the  waiting  time  at  each  site  be  W  {x)  ~  Geo  (/3x /n  ),  the  total  waiting 
time  W  =  Xx=l  W(*)  and  t'  =  Then 

P (W  >  Ct')  <  2 t(1~C)/2. 


Proof.  By  Markov’s  inequality  for  k  >  0, 

Eexw 

W  ^  c''>  ^ 

The  W(x)  are  independent  so 


Eexw  =  l\Ee^WM. 

X  =  l 


Summing  the  geometric  series  we  find 


Eeww  = 


fix 

n 


-  i  + 


§X_  ’ 
n 


provided  ek  <  |  for  all  1  <  x  <  t.  Therefore  eK  is  of  the  form  ^  lafi  ,  where 
0  <  a  <  1 .  With  this, 

Eem*)  = 

x  —  a 


and 


f!r(l  -a) 

r(t  +  1  -  a) 


We  are  free  to  choose  a  within  its  range  to  optimise  the  bound.  However,  for  simplicity, 
we  will  choose  a  =  1/2.  From  Lemma  A. 13, 

Eew  <  2 Vi. 


The  result  follows,  using  the  inequality  1  —  x  <  e  x .  □ 
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A.2.3.  Phase  1  Here  we  prove  that  phase  1  completes  successfully  with  high  probability. 
The  bias  here  is  large  so  the  walk  moves  right  every  time  with  high  probability: 

Lemma  A.8.  The  probability  that  the  accelerated  chain  moves  right  at  each  step,  start¬ 
ing  from  x  =  1  for  t  steps,  is  at  least 


1  —  t2 /n. 

Proof  The  probability  of  moving  right  at  each  step  is 

T--r  3  (n  —  x)  (n  —  2) (n  —  3) —  t) 

U  3n  —  2x  —  1  ~~  (n  -  5/3 )(«  -  7/3) (2t  +  l)/3) 

>  (1  —  2/n)(l  —  3/n) . . .  (1  -  t/n) 

>  (1  -t/nf  >  1  —  t2/n. 


□ 

Let  t  =  ns .  Provided  S  <  1/2  this  probability  is  close  to  one.  Therefore,  with  high 
probability,  the  walk  moves  to  ns  in  ns  steps.  Using  Lemma  A.7  the  waiting  time  can 
be  bounded: 

Lemma  A.9.  Let  Wn>  be  the  waiting  time  during  phase  1.  Let  H  be  the  event  that  the 
walk  moves  right  at  each  step.  Then 

P  (w(1)  >  Cf\H\  <  2 nS(l-C)/2,  (A.2) 


where  f  = 

Proof  This  follows  directly  from  Lemma  A.7,  since  each  site  is  hit  exactly  once.  □ 

We  now  combine  these  two  lemmas  to  prove  that  phase  1  completes  successfully  with 
high  probability: 

Proof  (Proof  of  Lemma  5.4).  In  Lemma  A. 8,  we  show  that  in  ns  accelerated  steps,  the 
walk  moves  right  at  each  step  with  probability  >  1  —  n2S~l.  Call  this  event  H.  Then 
P (H)  >  1  —  n2S~l.  Lemma  A.9  shows  that  the  waiting  time  W(1)  is  bounded  with  high 
probability  (choosing  C  =  3): 

P(W(1)  <  15«<5 \nn/2\H)  >  1  -  2n~5. 

Then  we  can  bound  the  probability  of  phase  1  completing  successfully: 

P(Phase  1  completes  successfully)  >  P (H  n  W(1)  <  15n<51nn/2) 

=  P(H)P(W(1)  <  I5n8lnn/2\H) 

>  (1  -n25-1)(  1  -2 n~s) 

>  1  -n2S~l  -2 n~s. 


□ 
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A. 2.4.  Phase  2  Phase  2  starts  at  ns / 2  and  finishes  when  the  walk  has  reached  On  for 
some  constant  0  <  0  <  3/4.  We  show  that,  with  high  probability,  this  also  takes  time 
O  (n  log  n ) .  The  probability  of  moving  right  during  this  phase  is  at  least  p  =  .  We 

first  define  some  constants  that  we  will  derive  bounds  in  terms  of.  Let  y  be  a  constant 
>  2.  Let  [a  =  p  —  (1—  p)  and  jl  =  p/y.  Finally  let  s  =  jit  for  some  t  (which  will  be 
the  number  of  accelerated  steps).  Then,  with  high  probability,  the  walk  will  have  passed 
5  after  t  steps: 


Lemma  A.10.  Let  Xt  be  the  position  of  the  walk  at  accelerated  step  t,  where  Xq  =  ns. 
Then 


P(X,  <  s)  <  exp(— /x2f  (1  -  \/y)2 /2). 
Proof.  Let  X't  =  Xt  —  ns.  Then  from  Lemma  A.4, 

P(X;  <  fit-  ri)  <  exp  • 

Now  let  r)  =  fit  —  s  and  use 

P(X,  <s)  =  P(Xj  <s-  ns) 

<  p(x;  <  s) 


to  complete  the  proof.  □ 


We  now  prove  a  bound  on  the  waiting  time: 

Lemma  A.ll.  Let  W(2]  be  the  waiting  time  in  phase  2.  Then,  assuming  the  walk  does 
not  go  back  beyond  ns / 2, 


P  (w(2)  >  ' 5/7  ln  S  )  <  (4 A)3/2/x  + - P  ^  4_  J.  (A. 3) 

V  P  )  1  -  exp  (-^) 

Proof.  Let  Wk  ~  Geo  where  X^  is  the  position  of  the  walk  at  accelerated  step 

k  (Xo  =  ns ).  We  want  to  bound  (w.h.p.)  the  waiting  time  W(2]  =  X[=i  Wfc  of  t  steps 
of  the  accelerated  walk. 

Define  the  event  H  to  be 


H  = 


OO 

<  x)  <  x/jl 


x>ns/2  Lfc=l 


(A.4) 


If  H  occurs,  no  sites  have  been  hit  too  often  and  the  walk  has  not  gone  back  further  than 
ns / 2.  It  is  important  that  we  also  use  the  restriction  that  Xk  >  ns / 2  because  the  waiting 
time  grows  the  longer  the  walk  moves  back.  However,  it  is  very  unlikely  that  the  walk 
will  go  backwards  (even  to  n& / 2). 

We  now  define  some  more  notation  to  bound  the  waiting  time.  Let  X  =  (Xi,  X2,  ■ . . ,  Xt) 
be  a  tuple  of  positions  and  let  Nx  (X)  be  the  number  of  times  that  x  appears  in  X  and  let 
N(X)  =  (N\  (X),  N2(X),  . . . ,  Nn(X)).  Then  we  have  Nx(X)  =  t. 
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As  we  said  above,  the  waiting  time  at  x  =  a  stochastically  dominates  the  waiting  time 
at  V  =  b  for  b  >  a.  In  other  words, 

Wk>Wv  if  Xk  <  Xk/,  (A. 5) 

where  X  j>  Y  means  that  X  stochastically  dominates  Y.  Now  write  the  waiting  time  for 
all  steps: 


t 

W(2)(X)  =  ^wk 
k=  1 

Nx(X) 

=zz  Wh(x),  (A. 6) 

x  /?  =  ! 


where  Wh(x)  ~  Geo  (^). 

If  event  H  occurs,  we  can  put  some  bounds  on  Nx.  We  find  that,  for  all  x  >  n8/ 2, 

2  Ny(X)<x/H  (A. 7) 

y=ns  /  2 

and  Nx  (X)  =  0  for  x  <  ns /2.  Now  let  Xm  be  such  that  Nns/2(Xm)  =  jL  and  Nx(Xm)  = 
1  /jl  for  x  >  ns / 2.  Then 


^  \  Ny(Xm  )  X  /  fl. 

y=ns /2 

Now  we  introduce  the  relation  ■<: 

Definition  A.l.  Let  x  and  y  be  n -tuples.  Then  x  <  y  if 

k  k 

XA''  -  S  V' 

( =  1  ( =  1 


for  all  \  <k  <n  with  equality  for  k  =  n. 


(A. 8) 


(A.9) 


Note  that  this  is  like  majorisation,  except  the  elements  of  the  tuples  are  not  sorted.  Using 
this,  we  find  that  N(X)  r<  N(Xm).  (Using  Xv  Ny(x)  =  Zv  Av(X')  =  t  for  all  X,  X'.) 

If  we  combine  Eqs.  A.5  and  A.6  we  find  that  W(2)(X)  >  W(2)(X0  if  N(X)  >  N(Xr). 
Roughly  speaking,  this  is  simply  saying  that  the  waiting  time  is  larger  if  the  earlier  sites 
are  hit  more  often.  But  since  for  all  X  that  satisfy  H,  X  <  Xm,  we  have  Vk(2)(X)  < 
Vk(2)(X,„)  provided  H  occurs.  We  will  simplify  further  by  noting  that  Xm  ■<  Xq,  where 
Nx  (Xq)  as  l/pt  for  1  <  x  <  jit  =  s  and  zero  elsewhere.  Therefore 


PI  W(2)(X)  > 


5  Cn  In  s 
2  jl 


(X0)> 


5  Cn  In  5  \ 

) 


We  can  bound  this  by  applying  Lemma  A.l.  Let  Wh  =  Xt-i  Wh  (x).  Lrom  Lemma  A.l, 


P  (Wh  >  Ct ')  <2  s'-^ , 


(A. 10) 
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where  t'  =  5n}fs .  However,  we  want  a  bound  on  P  Wh  >  CT'/A)-  The  same 


reasoning  as  in  Lemma  A.7  bounds  this  as 

T/a 


l-c  \  i/A 


Y.Wh^ct’/A  <(2^) 


(A.11) 


,  /?=! 


Therefore 


SCnlnsA 


(1-0/2 

<  i 1 


(A.  12) 


To  complete  the  proof,  we  just  need  to  find  P(//c).  We  can  bound  it  using  the  union 
bound  and  Lemma  A. 5: 


n  r  00 


p(//c)  =  p[  |J  y£di(xk<x)>x/ii 

\x=ns/ 2  Lk=l 


n  /  00 

-  z  Hz 

x=ns /2  \k—  1 

n  / 

<  ^  2  exp  ( 

x=ns /2 

00  / 

<  ^  2  exp  f 

x=ns /2 

( 


n(A/.  <  x)  >  jc/jfi 
-M*(y  -  2) 


) 


-lix(y  -  2) 


0 

) 


2  exp 


2) 


) 


1  —  exp 


Now,  for  any  events  A  and  B, 

P(A)  =  P(A  n  B)+  P  (A  n  sc) 

=  P(A|fl)P(fi)  +  P(A  n  Bc ) 

<  P(A|fi)  +  P(SC), 

and  set  C  =  2  and  y  =  3  to  obtain  the  result.  □ 

We  now  combine  these  two  lemmas  to  prove  that  phase  2  completes  successfully  with 
high  probability: 

Proof  (Proof  of  Lemma  5.5).  Phase  2  can  fail  if: 

-  The  walk  does  not  reach  On.  The  probability  of  this  is  bounded  by  Lemma  A.  10: 

P(Af  <  On)  <  exp  - jiOn 

This  follows  from  setting  t  =  —  and  y  =  3. 

ft 
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-  The  waiting  time  is  too  long.  This  probability  is  bounded  by  Lemma  A.  1 1 : 
I5nln(0n)\  /  4  \  &  i  2exp(^TL) 


( 


P  (  W(2) 


> 


A 


■ 


“  \0n  )  1 


exp(-/z/2) 


+  («/?)"  /2- 


-  The  walk  gets  back  to  ns /2.  This  is  bounded  by  Lemma  A. 6: 

P  (Walk  gets  to  ns / 2)  <  (q/p)n^2  . 

So,  using  the  union  bound  we  can  bound  the  overall  probability  of  failure: 

2  exp 


/  2  \  /  4  \  5JT 

P(Phase  2  fails)  <  exp  I  --p.On  \  +  I  —  j  +  — 


exp(— /x/2) 


+  (<?/p)n  /2  . 


A.2.5.  Phase  3  This  phase  starts  at  On .  We  show  that  this  mixes  quickly  using  log-Sobo- 
lev  arguments. 


Lemma  A.12.  The  zero  chain  on  the  restricted  state  space  x  e  [m ,  n\,  where  m  =  On  /2 
for  0  <0  <  3/4,  has  mixing  time  O  (n  log  ”). 

Proof.  We  restrict  the  Markov  chain  to  only  run  from  in  by  adjusting  the  holding  prob¬ 
ability  at  m,  P(m,  m ) .  Construct  the  chain  P'  with  transition  matrix 


P\x,y) 


0 

1  —  P(m,  m  +  1) 

P(x,y) 


where  P  is  the  transition  matrix  of  the  full  zero 
distribution 


x  <  m  ory  <  m 

x  =  y  =  m  ,  (A. 13) 

otherwise 

chain.  This  chain  then  has  stationary 


n\x) 


7t(x)/(1  —  w)  m  <  x  <  n 
0  otherwise 


(A. 14) 


where  w  =  (  tt  (x ) .  To  see  this,  first  note  that  the  distribution  is  normalised.  We 

want  to  show  that 


n 

P\x,  y)n\x)  =  Jt'jy).  (A.15) 

x=m 

When  y  =  m  we  are  required  to  prove  that  P'(m ,  m)Tz'(m)  +  P' (m  +  1,  m)jz'{jn  + 
1)  =  jt'(m).  This  follows  from  the  reversibility  of  the  unrestricted  zero  chain,  using 
P\m,  m)  =  1  —  P{m,  m  +  1).  For  y  >  m,  Eq.  A.15  is  satisfied  simply  because  7t(jc)  is 
the  stationary  distribution  of  P  and  related  by  a  constant  factor  to  tt'(x). 

We  can  now  prove  this  final  mixing  time  result,  making  use  of  Lemma  4.4.  Let  <2,  be 
the  chain  that  uniformly  mixes  site  i .  This  converges  in  one  step  and  has  a  log-Sobolev 
constant  independent  of  n;  call  it  p\.  Let  Q  be  the  chain  that  chooses  a  site  at  random 
and  then  uniformly  mixes  that  site.  This  is  the  product  chain  of  the  Qi  so,  by  Lemma 
4.4,  has  gap  1  /n  and  Sobolev  constant  pq  =  p\/n.  We  can  construct  the  zero  chain  for 
this  and  find  its  Sobolev  constant. 
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The  Sobolev  constant  is  defined  (Definition  4.3)  in  terms  of  a  minimisation  over  functions 
on  the  state  space.  For  the  chain  Q  we  can  write 


pQ  =  inf  /(0). 

<p 


If  we  restrict  the  infimum  to  be  over  functions  0  with  <p(x)  =  0(y)  for  x  and  y  con¬ 
taining  the  same  number  of  zeroes  then  we  obtain  the  Sobolev  constant  for  the  zero-Q 
chain,  pg0,  which  is  the  chain  which  counts  the  number  of  zeroes  in  the  full  chain  Q. 
Since  taking  the  infimum  over  less  functions  cannot  give  a  smaller  value, 

PQo  >  PQ>  Pi/n. 

We  can  now  compare  this  chain  to  the  zero -P  chain.  The  stationary  distributions  are  the 
same.  The  transition  matrix  for  the  zero -Q  chain  is 


Qo(x,  y ) 


n+ 2x 
An 
x 
An 

3  (n—x) 
An 

0 


y  =  x 

y  =  x  —  1 
y  =  x  +  1 
otherwise 


Then  construct  Q '0  by  restricting  the  space  to  only  run  from  m  in  exactly  the  same  way 
as  P'  is  constructed  from  P.  Q'0  has  the  same  stationary  distribution  as  P'.  Now  we  can 
perform  the  comparison.  From  Eq.  4.14: 


A  =  max 

a>m 


=  max 

a>m 


Q'0(a,a+  1) 
P'(a,  a  +  1) 

5  (n  -  1)  5 

8  a  ~  8  0 


Therefore  pp’  >  Exactly  the  same  argument  applies  to  show  the  gap  is  Q.(\  /n), 
so  the  mixing  time  is  (from  Eq.  4.16)  0(n  log  ”).  □ 

Now  we  can  prove  that  phase  3  completes  successfully  with  high  probability: 

Proof  (of  Lemma  5.6).  In  Lemma  A.  12,  we  show  that  after  O  («  log  ”)  steps  the  chain 
mixes  to  distance  e.  We  just  need  to  show  that  the  walk  goes  back  to  On/2  with  small 
probability.  This  follows  from  Lemma  A.6. 


A. 3.  Moment  generating  function  calculations.  The  following  lemma  is  needed  in  the 
moment  generating  function  calculations. 

Lemma  A.13.  For  Integer  s  >  0, 


F(s  +  l)T(l/2) 


<  2  *Js. 


r(s  +  i/2) 


(A.  16) 
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Proof.  From  expanding  the  T  functions,  Eq.  A.  16  becomes 


s\2s 

(: Is  -  1)!! 


2  x  4  x  6  x  •  •  •  x  2  (s  —  1)  x  2s 
1  x  3  x  5  x  ■  ■  ■  x  (2s  —  3)  x  (25  —  1) 


n 


2x 

2x-V 


1  r\ 

We  then  proceed  by  induction.  J|v=1  Jx^J  =  ^  an(l  by  the  inductive  hypothesis 


n 


2x  ^  2{s  +  1) 

2x  -  1  “  2  (s  +  1)  -  1 


2  *Js. 


It  is  easy  to  show  that 


2(.v+l) 

2(j+l)-l 


< 


and  the  result  follows. 


□ 


A.  4.  Mixing  times.  We  find  bounds  for  the  mixing  time  above  that  are  valid  with  high 
probability.  Below  we  turn  these  into  full  mixing  time  bounds. 

Lemma  A.14.  If  after  0(n  log  n  )  steps  the  state  v  of  a  random  walk  satisfies 

llu  -  7T|I  <  S, 

where  n  is  the  stationary  distribution  and  8  is  \/poly{n),  then  the  number  of  steps 
required  to  be  at  most  a  distance  e  from  stationarity  is 

O  (n  log  ”  )  . 

Proof.  Let  s  be  the  slowest  mixing  initial  state.  Then,  after  t  =  0(n  log  n)  steps  we 
have  at  worst  the  state 


(1  —  8)tt  +  8s, 


and  if  we  repeat  kt  times  8  becomes  8k.  So  to  get  a  distance  e,k  = 
Now  we  evaluate  the  mixing  time: 


log  6 
log  S 


kt  =  0(n  log  n) 


log£ 

log  8 


0(n  logn) 


log  1/e' 


log  1/8 
O  (n  max  (log  n ,  log  1/e)) 
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Abstract.  The  first  separation  between  quantum  polynomial  time  and  classical 
bounded-error  polynonrial  time  was  due  to  Bernstein  and  Vazirani  in  1993.  They 
first  showed  a  0(1)  vs.  Q{ri)  quantum-classical  oracle  separation  based  on  the 
quantum  Hadamard  transform,  and  then  showed  how  to  amplify  this  into  a  n° ' 1 1 
time  quantum  algorithm  and  a  nJ<7'los  n>  classical  query  lower  bound. 

We  generalize  both  aspects  of  this  speedup.  We  show  that  a  wide  class  of  uni¬ 
tary  circuits  (which  we  call  dispersing  circuits)  can  be  used  in  place  of  Hadamards 
to  obtain  a  0(1)  vs.  Q(n)  separation.  The  class  of  dispersing  circuits  includes 
all  quantum  Fourier  transforms  (including  over  nonabelian  groups)  as  well  as 
nearly  all  sufficiently  long  random  circuits.  Second,  we  give  a  general  method  for 
amplifying  quantum-classical  separations  that  allows  us  to  achieve  a  n° ( 1  j  vs. 
rrQt log  ”)  separation  from  any  dispersing  circuit. 


1  Background 

Understanding  the  power  of  quantum  computation  relative  to  classical  computation 
is  a  fundamental  question.  When  we  look  at  which  problems  can  be  solved  in  quan¬ 
tum  but  not  classical  polynomial  time,  we  get  a  wide  range:  quantum  simulation,  fac¬ 
toring,  approximating  the  Jones  polynomial,  Pell’s  equation,  estimating  Gauss  sums, 
period-finding,  group  order-finding  and  even  detecting  some  mildly  non-abelian  sym¬ 
metries  [Sho97,  Hal07,  WatOl,  FIM+03,  vDHI03].  However,  when  we  look  at  what 
algorithmic  tools  exist  on  a  quantum  computer,  the  situation  is  not  nearly  as  diverse. 
Apart  from  the  BQP-complete  problems  [AJL06],  the  main  tool  for  solving  most  of 
these  problems  is  a  quantum  Fourier  transform  (QFT)  over  some  group.  Moreover,  the 
successes  have  been  for  cases  where  the  group  is  abelian  or  close  to  abelian  in  some 
way.  For  sufficiently  nonabelian  groups,  there  has  been  no  indication  that  the  trans¬ 
forms  are  useful  even  though  they  can  be  computed  exponentially  faster  than  classi¬ 
cally.  For  example,  while  an  efficient  QFT  for  the  symmetric  group  has  been  intensively 
studied  for  over  a  decade  because  of  its  connection  to  graph  isomorphism,  it  is  still 
unknown  whether  it  can  be  used  to  achieve  any  kind  of  speedup  over  classical  compu¬ 
tation  [Bea97]. 

The  first  separation  between  quantum  computation  and  randomized  computation 
was  the  Recursive  Fourier  Sampling  problem  (RFS)  [BV97].  This  algorithm  had  two 
components,  namely  using  a  Fourier  transform,  and  using  recursion.  Shortly  after  this, 
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Simon’s  algorithm  and  then  Shor’s  algorithm  for  factoring  were  discovered,  and  the 
techniques  from  these  algorithms  have  been  the  focus  of  most  quantum  algorithmic 
research  since  [Sim97,  Sho97].  These  developed  into  the  hidden  subgroup  framework. 
The  hidden  subgroup  problem  is  an  oracle  problem,  but  solving  certain  cases  of  it  would 
result  in  solutions  for  factoring,  graph  isomorphism,  and  certain  shortest  lattice  vec¬ 
tor  problems.  Indeed,  it  was  hoped  that  an  algorithm  for  graph  isomorphism  could  be 
found,  but  recent  evidence  suggests  that  this  approach  may  not  lead  to  one  [HMR+06]. 
As  a  way  to  understand  new  techniques,  this  oracle  problem  has  been  very  impor¬ 
tant,  and  it  is  also  one  of  the  very  few  where  super-polynomial  speedups  have  been 
found  [IMS01,  BCvD05]. 

In  comparison  to  factoring,  the  RFS  problem  has  received  much  less  attention.  The 
problem  is  defined  as  a  property  of  a  tree  with  labeled  nodes  and  it  was  proven  to  be 
solvable  with  a  quantum  algorithm  super-polynomially  faster  than  the  best  randomized 
algorithm.  This  tree  was  defined  in  terms  of  the  Fourier  coefficients  over  .  The  defini¬ 
tion  was  rather  technical,  and  it  seemed  that  the  simplicity  of  the  Fourier  coefficients  for 
this  group  was  necessary  for  the  construction  to  work.  Even  the  variants  introduced  by 
Aaronson  [Aar03]  were  still  based  on  the  same  QFT  over  ZJ? ,  which  seemed  to  indicate 
that  this  particular  abelian  QFT  was  a  key  part  of  the  quantum  advantage  for  RFS. 

The  main  result  of  this  paper  is  to  show  that  the  RFS  structure  can  be  generalized 
far  more  broadly.  In  particular,  we  show  that  an  RFS-style  super-polynomial  speedup 
is  achievable  using  almost  any  quantum  circuit,  and  more  specifically,  it  is  also  true 
for  any  Fourier  transform  (even  nonabelian),  not  just  over  .  This  illustrates  a  more 
general  power  that  quantum  computation  has  over  classical  computation  when  using 
recursion.  The  condition  for  a  quantum  circuit  to  be  useful  for  an  RFS-style  speedup 
is  that  the  circuit  be  dispersing,  a  concept  we  introduce  to  mean  that  it  takes  many 
different  inputs  to  fairly  even  superpositions  over  most  of  the  computational  basis. 

Our  algorithm  should  be  contrasted  with  the  original  RFS  algorithm.  One  of  the  main 
differences  between  classical  and  quantum  computing  is  so-called  garbage  that  results 
from  computing.  It  is  important  in  certain  cases,  and  crucial  in  recursion-based  quan¬ 
tum  algorithms  because  of  quantum  superpositions,  that  intermediate  computations  are 
uncomputed  and  that  errors  do  not  compound.  The  original  RFS  paper  [BV97]  avoided 
the  error  issue  by  using  an  oracle  problem  where  every  quantum  state  create  from  it 
had  the  exact  property  necessary  with  no  errors.  Their  algorithm  could  have  tolerated 
polynomially  small  errors,  but  in  this  paper  we  relax  this  significantly.  We  show  that 
even  if  we  can  only  create  states  with  constant  accuracy  at  each  level  of  recursion,  we 
can  still  carry  through  a  recursive  algorithm  which  introduces  new  constant-sized  errors 
a  polynomial  number  of  times. 

The  main  technical  part  of  our  paper  shows  that  most  quantum  circuits  can  be  used  to 
construct  separations  relative  to  appropriate  oracles.  To  understand  the  difficulty  here, 
consider  two  problems  that  occur  when  one  tries  to  define  an  oracle  whose  output  is 
related  to  the  amplitudes  that  result  from  running  a  circuit.  First,  it  is  not  clear  how 
to  implement  such  an  oracle  since  different  amplitudes  have  different  magnitudes,  and 
only  phases  can  be  changed  easily.  Second,  we  need  an  oracle  where  we  can  prove 
that  a  classical  algorithm  requires  many  queries  to  solve  the  problem.  If  the  oracle 
outputs  many  bits,  this  can  be  difficult  or  impossible  to  achieve.  For  example,  the  matrix 
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entries  of  nonabelian  groups  can  quickly  reveal  which  representation  is  being  used.  To 
overcome  these  two  problems  we  show  that  there  are  binary-valued  functions  that  can 
approximate  the  complex- valued  output  of  quantum  circuits  in  a  certain  way. 

One  by-product  of  our  algorithm  is  related  to  the  Fourier  transform  of  the  symmet¬ 
ric  group.  Despite  some  initial  promise  for  solving  graph  isomorphism,  the  symmetric 
group  QFT  has  still  not  found  any  application  in  quantum  algorithms.  One  instance  of 
our  result  is  the  first  example  of  a  problem  (albeit  a  rather  artificial  one)  where  the  QFT 
over  the  symmetric  group  is  used  to  achieve  a  super-polynomial  speedup. 


2  Statement  of  Results 

Our  main  contributions  are  to  generalize  the  RFS  algorithm  of  [BV97]  in  two  stages. 
First,  [BV97]  described  the  problem  of  Fourier  sampling  over  Zl?,  which  has  an  0(1) 
vs.  12  (n)  separation  between  quantum  and  randomized  complexities.  We  show  that  here 
the  QFT  over  can  be  replaced  with  a  QFT  over  any  group,  or  for  that  matter  with  al¬ 
most  any  quantum  circuit.  Next,  [BV97]  turned  Fourier  sampling  into  recursive  Fourier 
sampling  with  a  recursive  technique.  We  will  generalize  this  construction  to  cope  with 
error  and  to  amplify  a  larger  class  of  quantum  speedups.  As  a  result,  we  can  turn  any  of 
the  linear  speedups  we  have  found  into  superpolynomial  speedups. 

Let  us  now  explain  each  of  these  steps  in  more  detail.  We  replace  the  0(1)  vs  i?(n) 
separation  based  on  Fourier  sampling  with  a  similar  separation  based  on  a  more  general 
problem  called  oracle  identification.  In  the  oracle  identification  problem,  we  are  given 
access  to  an  oracle  Oa  :  X  — >  {0, 1}  where  a  E  A,  for  some  sets  A  and  X  with 
log  \A\,  log  |X|  =  <9(n).  Our  goal  is  to  determine  the  identity  of  a.  Further,  assume 
that  we  have  access  to  a  testing  oracle  Ta  :  A  — >  (0, 1}  defined  by  Ta(a')  —  5a,a',  that 
will  let  us  confirm  that  we  have  the  right  answer.1 * * * * 

A  quantum  algorithm  for  identifying  a  can  be  described  as  follows:  first  prepare  a 
state  \ifa)  using  q  queries  to  Oa,  then  perform  a  POVM  {TIa/}ateA  (with  ^o,  TIat  <  I 
to  allow  for  the  possibility  of  a  “failure”  outcome),  using  no  further  queries  to  Oa  ■  The 
success  probability  is  (ipa\IIa\ </?„).  For  our  purposes,  it  will  suffice  to  place  a  J?(l) 
lower  bound  on  this  probability:  say  that  for  each  a,  (</?a \TIa \<pa)  >  5  for  some  constant 
5  >  0.  On  the  other  hand,  any  classical  algorithm  trivially  requires  >  log(|  A|4)  =  i?(n) 
oracle  calls  to  identify  a  with  success  probability  >  5.  This  is  because  each  query 
returns  only  one  bit  of  information.  In  Theorem  9  we  will  describe  how  a  large  class  of 
quantum  circuits  can  achieve  this  0(1)  vs.  i?(n)  separation,  and  in  Theorems  11  and 
12  we  will  show  specifically  that  QFTs  and  most  random  circuits  fall  within  this  class. 

Now  we  describe  the  amplification  step.  This  is  a  variant  of  the  [BV97]  procedure 
in  which  making  an  oracle  call  in  the  original  problem  requires  solving  a  sub-problem 
from  the  same  family  as  the  original  problem.  Iterating  this  £  times  turns  query  com¬ 
plexity  q  into  q&^\  so  choosing  £  —  O(logn)  will  yield  the  desired  polynomial  vs. 

1  This  will  later  allow  us  to  turn  two-sided  into  one-sided  error;  unfortunately  it  also  means  that 

a  non-deterministic  Turing  machine  can  find  a  with  a  single  query  to  TQ.  Thus,  while  the  oracle 

defined  in  BV  is  a  candidate  for  placing  BQP  outside  PH,  ours  will  not  be  able  to  place  BQP 

outside  of  NP.  This  limitation  appears  not  to  be  fundamental,  but  we  will  leave  the  problem  of 

circumventing  it  to  future  work. 
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super-polynomial  separation.  We  will  generalize  this  construction  by  defining  an  am¬ 
plified  version  of  oracle  identification  called  recursive  oracle  identification.  This  is  de¬ 
scribed  in  the  next  section,  where  we  will  see  how  it  gives  rise  to  superpolynomial 
speedups  from  a  broad  class  of  circuits. 

We  conclude  that  quantum  speedups — even  superpolynomial  speedups — are  much 
more  common  than  the  conventional  wisdom  would  suggest.  Moreover,  as  useful  as 
the  QFT  has  been  to  quantum  algorithms,  it  is  far  from  the  only  source  of  quantum 
algorithmic  advantage. 

3  Recursive  Amplification 

In  this  section  we  show  that  once  we  are  given  a  constant  versus  linear  separation  (for 
quantum  versus  classical  oracle  identification),  we  are  able  to  amplify  this  to  a  super¬ 
polynomial  speedup.  We  require  a  much  looser  definition  than  in  [BV97]  because  the 
constant  case  can  have  a  large  error. 

Definition  1.  For  sets  A,  X ,  let  f  :  A  x  X  — »  (0, 1}  be  a  function.  To  set  the  scale  of 

the  problem,  let  |X|  —  2n  and  |A|  =  2n^n\  Define  the  set  of  oracles  { Oa  :  a  G  A}  by 

Oa(x)  =  f(a1x),  and  the  states  \pa)  =  I3')-  The  single -lev el 

y\x\ 

oracle  identification  problem  is  defined  to  be  the  task  of  determining  a  given  access  to 
Oa.  Let  U  be  a  family  of  quantum  circuits,  implicitly  depending  on  n.  We  say  that  U 
solves  the  single-level  oracle  identification  problem  if 

\{a\U\pa)\2  >  Q{1) 

for  all  sufficiently  large  n  and  all  a  G  A.  In  this  case,  we  define  the  POVM  {na}aeA 
by  na  =  W  \a){a\U. 

When  this  occurs,  it  means  that  a  can  be  identified  from  Oa  with  17(1)  success  prob¬ 
ability  and  using  a  single  query.  In  the  next  section,  we  will  show  how  a  broad  class 
of  unitaries  U  (the  so-called  dispersing  unitaries)  allow  us  to  construct  /  for  which 
U  solves  the  single-level  oracle  identification  problem.  There  are  natural  generaliza¬ 
tions  to  oracle  identification  problems  requiring  many  queries,  but  we  will  not  explore 
them  here. 

Theorem  2.  Suppose  we  are  given  a  single-level  oracle  problem  with  function  f  and 
unitary  U  running  in  time  poly(n).  Then  we  can  construct  a  modified  oracle  problem 
from  f  which  can  be  solved  by  a  quantum  computer  in  polynomial  time  (and  queries), 
but  requires  nQ(]°z  n>  queries  for  any  classical  algorithm  that  succeeds  with  probability 

I  -|-  n-o(log n). 

We  start  by  defining  the  modified  version  of  the  problem  (Definition  3  below),  and 
describing  a  quantum  algorithm  to  solve  it.  Then  in  Theorem  4  we  will  show  that  the 
quantum  algorithm  solves  the  problem  correctly  in  polynomial  time,  and  in  Theorem  6, 
we  will  show  that  randomized  classical  algorithms  require  superpolynomial  time  to 
have  a  nonnegligible  probability  of  success. 
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Fig.  1.  A  depth  k  node  at  location  x  —  (x  \ , . . . ,  Xk)  is  labeled  by  its  secret  sx  and  a  bit  bx .  The 
secret  sx  can  be  computed  from  the  bits  by  of  its  children,  and  once  it  is  known,  the  bit  bx  is 
computed  from  the  oracle  0(x,sx)  —  bx.  If  x  is  a  leaf  then  it  has  no  secret  and  we  simply  have 
bx  —  O(x).  The  goal  is  to  compute  the  secret  bit  at  the  root. 

The  recursive  version  of  the  problem  simply  requires  that  another  instance  of  the 
problem  be  solved  in  order  to  access  a  value  at  a  child.  Figure  1  illustrates  the  structure 
of  the  problem. 

Using  the  notation  from  Figure  1,  the  relation  between  a  secret  sx,  and  the  bits  by 
of  its  children  is  given  by  by  =  f(sx,x'),  where  /  is  the  function  from  the  single- 
level  oracle  identification  problem.  Thus  by  computing  enough  of  the  bits  byi ,  bV2 , . . . 
corresponding  to  children  yi,  y2,  ■ . we  can  solve  the  single-level  oracle  identification 
problem  to  find  sx.  Of  course  computing  the  by  will  require  finding  the  secret  strings 
sy,  which  requires  finding  the  bits  of  their  children  and  so  on,  until  we  reach  the  bottom 
layer  where  queries  return  answer  bits  without  the  need  to  first  produce  secret  strings. 

Definition  3.  A  level-i.  recursive  oracle  identification  problem  is  specified  by  X ,  A  and 
f  from  a  single-level  oracle  identification  problem  (Definition  1),  any  function  s  :  0  U 
X  U  X  x  X  U  . . .  U  Xl~x  —y  A,  and  any  final  answer  b$  €  {0, 1}.  Given  these 
ingredients,  an  oracle  O  is  defined  which  takes  inputs  in 

£-1 

|J  [Xk  xAjul1 

k= 0 

and  to  return  outputs  in  {0, 1,  FAIL}.  On  inputs  Xi, . . . ,  xk  €  X,  a  £  A  with  1  <  k  < 
l,  O  returns 

0(x!, . . .  ,xk,a)  =  f(s(xi,...,xk-i),xk)  when  a  =  s(xi, . . . ,  xk)  (1) 

0(xi, . . . ,  xk,  a)  —  FAIL  when  a  ^  s(x  i,...,xk)-  (2) 

Ifk  =  0,  then  C?(s(0))  =  b$  and  0(a)  —  FAIL  if  a  s(0).  When  k  =  £, 

0(x  i,  ...,Xi)  =  f(s  Oi, . .  .,xe-i  ),xe). 

The  recursive  oracle  identification  problem  is  to  determine  b$  given  access  to  O. 
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Note  that  the  function  s  gives  the  values  sx  in  Figure  1 .  These  values  are  actually  de¬ 
fined  in  the  oracle  and  can  be  chosen  arbitrarily  at  each  node.  Note  also  that  the  or¬ 
acle  defined  here  effectively  includes  a  testing  oracle,  which  can  determine  whether 
a  —  s(x i, . . . ,  Xk)  for  any  a  G  A,  x\, . . . ,  Xk  G  X  with  one  query.  (When  x  = 
(xi, . . . ,  Xk),  we  use  s(xi, . . . ,  Xk)  and  sx  interchangeably.)  A  significant  difference 
between  our  construction  and  that  of  [BV97]  is  that  the  values  of  s  at  different  nodes 
can  be  set  completely  independently  in  our  construction,  whereas  [BV97]  had  a  com¬ 
plicated  consistency  requirement. 

The  algorithm.  Now  we  turn  to  a  quantum  algorithm  for  the  recursive  oracle  identi¬ 
fication  problem.  If  a  quantum  computer  can  identify  a  with  one-sided2 * *  error  1  —  5 
using  time  T  and  q  queries  in  the  non-recursive  problem,  then  we  will  show  that  the 
recursive  version  can  be  solved  in  time  0((qlos^6yT).  For  concreteness,  suppose  that 
\ipa)  =  ^2xex(~ iH' |ic) ,  so  that  q  =  1;  the  case  when  q  >  1  is  an  easy,  but 

tedious,  generalization.  Suppose  that  our  identifying  quantum  circuit  is  U,  so  a  can  be 
iden tified  by  applying  the  PO VM  (ila/}a/e ,4  with  i7a/  =  W  \a')(a'\  U  to  the  state  \pa)- 
The  intuitive  idea  behind  our  algorithm  is  as  follows:  At  each  level,  we  find  s(xi , . . . , 
Xk)  by  recursively  computing  s(xi, . . . ,  Xk+i)  for  each  Xk+i  (in  superposition)  and 
using  this  information  to  create  many  copies  of  |¥?s(Xl, ...,**,))>  from  which  we  can  ex¬ 
tract  our  answer.  However,  we  need  to  account  for  the  errors  carefully  so  that  they  do 
not  blow  up  as  we  iterate  the  recursion.  In  what  follows,  we  will  adopt  the  conven¬ 
tion  that  Latin  letters  in  kets  (e.g.  |a),  |x), . . .)  denote  computational  basis  states,  while 
Greek  letters  (e.g.  |0,  \p), . . .)  are  general  states  that  are  possibly  superpositions  over 
many  computational  basis  states.  Also,  we  let  the  subscript  rk)  indicate  a  dependence 
on  (xi, . . . ,  Xk)-  The  recursive  oracle  identification  algorithm  is  as  follows: 

Algorithm:  FIND 

Input:  |a:i, . . . ,  *fc)|0)  for  k  <  l 

Output:  a(k)  =  s(*i, . . . ,  Xk)  up  to  error  e  =  (<5 /8)2,  where  8  is  the  constant  from  the  oracle.  This  means 
|*i, . .  .,*fc)  [yi  —  e(fc)|0)|fl(fc))|C(fc))  +  \A(fc)1 1)  I  C(fc) )  j  >  where  e  (*,)  <  e  and  |C(fc))  and  |C(fc))  are  arbitrary. 
(We  can  assume  this  form  without  loss  of  generality  by  absorbing  phases  into  |C(fc))  and  |CL))-) 

1.  Create  the  superposition  -7=  E*fc+1ex:  kfe+i)- 

2.  If  k  +  1  <  t  then  let  a^k+ 1)  =  FIND(*i, . . . ,  *fc+i)  (with  error  <  e),  otherwise  a^+i)  =  0. 

3.  Call  the  oracle  0(x i, . . . ,  Xk+ 1,  U(*;+i))  to  apply  the  phase  (—  using  the  key  a(j.+i). 

4.  If  k  +  1  <  l  then  call  FIND^  to  (approximately)  uncompute  a(fc+i). 

5.  We  are  now  left  with  |<P(fc)),  which  is  close  to  |^>s(xil 
Repeat  steps  14m  =  j  In  |  times  to  obtain 

6.  Coherently  measure  {IIa}  on  each  copy  and  test  the  results  (i.e.  apply  U,  test  the  result,  and  apply  (A). 

7.  If  any  tests  pass,  copy  the  correct  a^k)  1°  an  output  register,  along  with  |0)  to  indicate  success. 

Otherwise  put  a  |1)  in  the  output  to  indicate  failure. 

8.  Let  everything  else  comprise  the  junk  register  |C(fc))- 

Theorem  4.  Calling  FIND  on  |0)  solves  the  recursive  oracle  problem  in  quantum  poly¬ 
nomial  time. 

2  One-sided  error  is  a  reasonable  demand  given  our  access  to  a  testing  oracle.  Most  of  these 

results  go  through  with  two-sided  error  as  well,  but  for  notational  simplicity,  we  will  not  explore 

them  here. 
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Proof.  The  proof  is  by  backward  induction  on  k;  we  assume  that  the  algorithm  returns 
with  error  <  e  for  k  +  1  and  prove  it  for  k.  The  initial  step  when  k  =  £  is  trivial  since 
there  is  no  need  to  compute  ai+ 1,  and  thus  no  source  of  error.  If  k  <  £,  then  assume 
that  correctness  of  the  algorithm  has  already  been  proved  for  k  +  1.  Therefore  Step  2 
leaves  the  state 


1 

kW\ 


E 

xk+i 


\Xk+l) 


y/1  -  £(fc+i)|0)|a(fc+i))|C(fc+i))  +  x/£(fc+i)|l)|C(fe+i)) 


In  Step  3,  we  assume  for  simplicity  that  the  oracle  was  called  conditional  on  the  success 
of  Step  2.  This  yields 

I f[k))  '■=  TTt  X!  l^+i)  [(-l)/(“('!)’Xfe+1^l  -  £(fc+i)|0)|a(fc+1))|C(fc+l))  +  ^/£(fc+i)l1>IC(fc+i))]  ■ 

V  lA  I  xk+1ex 


Now  define  the  state  \il>(k))  by 

iV’(fc))  :=  /TyT  (-l)fia^Xk+l)\xk+i)  [y^1  -  e(fc+i) |0) |a(fc+i)) |C(fc+i))  +  v/^(fc+i)|1)lC(fc+i))]  ■ 

V  lAl  Xk+1ex 


Note  that 

(V’(fc) IV’(fc))  =  7^7  X]  i1  ~  £(fc+!)  +  (-l)/(a<fc>’Xfc+l)e(k+1))  • 

Xk+l  EX 

This  quantity  is  real  and  always  >  1  —  2e^k+1^  >  y/1  —  4e  by  the  induction  hypothesis. 
Let 

i«W:=]4  E  (-i)/(“<«'a!‘+‘>ixfc+i)|o>. 

Xk+ iEX 

Note  that  FIND'  \x\, ...,  xk,  f>{k))  =  \xi, . xk:  4>(k))-  Thus  there  exists  such 
that  applying  FIND 1  to  |o?i, . . . ,  xk)  Wtk-\)  yields 


X\ 5  •  •  •  5  X]f) 


\J 1  ~  ^(k)\4>(k))  +  \/ 4e(fe) |^(fc))  ) 


where  (0(fc)  =  0  and  e(fc)  <  e. 

We  now  want  to  analyze  the  effects  of  measuring  {i7a}  when  we  are  given  the  state 

\P(k))  :=  x/1  ~  4e(fc)l^(fe))  + 

instead  of  \4>(k)).  If  we  define  ||M||i  =tr  \[mXm  for  a  matrix  M,  then  ||  |  <£(&))  (<£>(*;)  |  — 
|</>(fc)}(</>(fc)|  111  =4v^(fcj'[FvdG99].Thus 

{p(k)\na(k)\P(k))  >  {(P(k)\na(k)\f(k))  - 4^^- >  d  -  4^/e^y  >  d/2. 


In  the  last  step  we  have  chosen  £  =  (d/8)2. 
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Finally,  we  need  to  guarantee  that  with  probability  >  1  —  e  at  least  one  of  the  tests  in 
Step  6  passes.  After  applying  U  and  the  test  oracle  to  | <p(k)),  we  have  >  y/ 5  / 2  overlap 
with  a  successful  test  and  <  yj\  —  5/2  overlap  with  an  unsuccessful  test.  When  we 
repeat  this  m  times,  the  amplitude  in  the  subspace  corresponding  to  all  tests  failing  is 
<  (1  —  5/ 2)m/2  <  e-m5/4.  If  we  choose  m  —  (2/5)  ln(l/e)  =  (4 / 5)  ln(8/5)  then  the 
failure  amplitude  will  be  <  y/e,  as  desired. 

To  analyze  the  time  complexity,  first  note  that  the  run-time  is  0(T )  times  the  number 
of  queries  made  by  the  algorithm,  and  we  have  assumed  that  T  is  polynomial  in  n. 
Suppose  the  algorithm  at  level  k  requires  Q ( k )  queries.  Then  steps  2  and  4  require 
mQ(k  +  1)  queries  each,  steps  3  and  6  require  m  queries  each  and  together  Q(k)  = 
2mQ(k  +  l)  +  2m.  The  base  case  is  k  =  £,  for  which  Q(£)  —  0,  since  there  are  no  secret 
strings  to  calculate  for  the  leaves.  The  total  number  of  queries  required  for  the  algorithm 
is  then  Q(0)  ~  (2 m)2i.  If  we  choose  £  =  log  n  the  quantum  query  complexity  will  thus 
be  n21og2m  —  n° f1)  and  the  quantum  complexity  will  be  polynomial  in  n  compared 
with  the  nn('logn')  lower  bound. 


This  concludes  the  demonstration  of  the  polynomial-time  quantum  algorithm.  Now  we 
turn  to  the  classical  rrfT,og  n)  lower  bound.  Our  key  technical  result  is  the  following 
lemma: 


Lemma  5.  Define  the  recursive  oracle  identification  problem  as  above,  with  a  function 
f  :  Ax  X  -A  {0, 1}  and  a  secret  s:|UXUXxXU...U  Xl~l  i-a  A  encoded  in  an 
oracle  O.  Fix  a  deterministic  classical  algorithm  that  makes  <  Q  queries  to  O.  Then  if 
s  and  ANS  are  chosen  uniformly  at  random,  the  probability  that  ANS  is  output  by  the 
algorithm  is 


<  — b  max 
“  2 


Q 


\A\1/3  -  Q 


,Q 


log  \A\ 


Using  Yao’s  minimax  principle  and  plugging  in  |  A\  =  2an,  £  =  log  n  and  Q  =  n°^og  n') 
readily  yields. 

Theorem  6.  If  log  |  A\  =  nQ(  1  and  £  =  12  (log  n),  then  any  randomized  classical  algo¬ 
rithm  using  Q  =  n°(log'n)  queries  will  have  |  +  rirn(lag  n  >  probability  of  successfully 
outputting  ANS. 

Proof  (of  Lemma  5).  Let  T  =  0UlU...UXf  denote  the  tree  on  which  the  oracle  is 
defined.  We  say  that  a  node  x  G  T  has  been  hit  by  the  algorithm  if  position  x  has  been 
queried  by  the  oracle  together  with  the  correct  secret,  i.e.  0(s(x ),  x)  has  been  queried. 
The  only  way  to  find  to  obtain  information  about  ANS  is  for  the  algorithm  to  query  0 
with  the  appropriate  secret;  in  other  words,  to  hit  0. 

For  x,y  e  T  we  say  that  x  is  an  ancestor  of  y,  and  that  y  is  a  descendant  of  x,  if 
y  —  x  x  z  for  some  z  G  T.  If  z  e  X  then  we  say  that  y  is  a  child  of  x  and  that  x  is  a 
parent  of  y.  Now  define  S  C  T  to  be  the  set  of  all  x  £  T  such  that  x  has  been  hit  but 
none  of  x’s  ancestors  have  been.  Also  define  a  function  d(x)  to  be  the  depth  of  a  node 
x;  i.e.  for  all  x  G  Xk,  d(x)  —  k.  We  combine  these  definitions  to  declare  an  invariant 


xES 


—  d(  x) 


3 
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The  key  properties  of  Z  we  need  are  that: 

1.  Initially  Z  —  0. 

2.  If  the  algorithm  is  successful  then  it  terminates  with  Z  =  1. 

3.  Only  oracle  queries  change  the  value  of  Z. 

4.  Querying  a  leaf  can  add  at  most  (log  |A|/3)_f  to  Z . 

5.  Querying  an  internal  node  (i.e.  not  a  leaf)  can  add  at  most  2/{\A\1^  —  Q)  to  E  Z, 
where  E  indicates  the  expectation  over  random  choices  of  s. 

Combining  these  facts  yields  the  desired  bound. 

Properties  1-4  follow  directly  from  the  definition  (with  the  inequality  in  property 
4  because  it  is  possible  to  query  a  node  that  has  already  been  hit).  To  establish  prop¬ 
erty  5,  suppose  that  the  algorithm  queries  node  x  £  T  and  that  it  has  previously  hit 
k  of  x’s  children.  This  gives  us  some  partial  information  about  s(x).  We  can  model 
this  information  as  a  partition  of  A  into  2k  disjoint  sets  Ai, . . . ,  A2k  (of  which  some 
could  be  empty).  From  the  k  bits  returned  by  the  oracle  on  the  k  children  of  x  we  have 
successfully  queried,  we  know  not  only  that  s(x)  £  A,  but  that  s(x)  £  Ai  for  some 
z£{l,...,2fc}. 

We  will  now  divide  the  analysis  into  two  cases.  Either  k  <|  log  \A\  or  k  >  |log|A|. 
We  will  argue  that  in  the  former  case,  |  Ai  |  is  likely  to  be  large,  and  so  we  are  unlikely  so 
successfully  guess  s(x),  while  in  the  latter  case  even  a  successful  guess  will  not  increase 
Z .  The  latter  case  (k  >  ^  log  |A|)  is  easier,  so  we  consider  it  first.  In  this  case,  Z  only 
changes  if  x  is  hit  in  this  step  and  neither  x  nor  any  of  its  ancestors  have  been  previously 
hit.  Then  even  though  hitting  x  will  contribute  (log  \A\/?>)~d^  to  Z,  it  will  also  remove 
the  k  children  from  S  (as  well  as  any  other  descendants  of  x),  which  will  decrease  Z 
by  at  least  /c(log  |A|/3)_d^)_1  >  (log  \A\/?>)~d^x\  resulting  in  a  net  decrease  of  Z. 

Now  suppose  that  k  <  |log|Aj.  Recall  that  our  information  about  s(x)  can  be 
expressed  by  the  fact  that  s(x)  £  Ai  for  some  i  £  {1, . . . ,  2k}.  Since  the  values  of  s 
were  chosen  uniformly  at  random,  we  have  Pr(Aj)  =  \Ai\/\  A\.  Say  that  a  set  Ai  is  bad 
if  |  <  |  A|2/3/2fc.  Then  for  a  particular  bad  set  Ai,  Pr(Ai)  <  |A|-1/32-fc.  From  the 

union  bound,  we  see  that  the  probability  that  any  bad  set  is  chosen  is  <  |A|-1/3. 

Assume  then  that  we  have  chosen  a  good  set  Ai,  meaning  that  conditioned  on  the 
values  of  the  children  there  are  | A* |  >  |A|2/3/2fc  >  (AI1/3  possible  values  of  s(x). 
However,  previous  failed  queries  at  x  may  also  have  ruled  out  specific  possible  values  of 
x.  There  have  been  at  most  Q  queries  at  x,  so  there  are  >  |  A| x/3  —  Q  possible  values  of 
s(x)  remaining.  (Queries  to  any  other  nodes  in  the  graph  yield  no  information  on  s(x).) 
Thus  the  probability  of  hitting  x  is  <  1  / ( | A| 1/3  —  Q)  if  we  have  chosen  a  good  set. 
We  also  have  a  <  |  A| —  1/3  probability  of  choosing  a  bad  set,  so  the  total  probability  of 
hitting  x  (in  the  k  <  ^  log  |A|  case)  is  <  |A|-1/3  -|-  l/( | A| 1/3  —  Q)  <  2/ ( | A| 1/3  —  Q ). 
Finally,  hitting  x  will  increase  Z  by  at  most  one,  so  the  largest  possible  increase  of 
E  Z  when  querying  a  non-leaf  node  is  <  2/(|  A| 1  /3  —  Q).  This  completes  the  proof  of 
property  5  and  thus  the  Femma. 

4  Dispersing  Circuits 

In  this  section  we  define  dispersing  circuits  and  show  how  to  construct  an  oracle  prob¬ 
lem  with  a  constant  versus  linear  separation  from  any  such  circuit.  In  the  next  sections 
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we  will  show  how  to  find  dispersing  circuits.  Our  strategy  for  finding  speedups  will  be 
to  start  with  a  unitary  circuit  U  which  acts  on  n  qubits  and  has  size  polynomial  in  n.  We 
will  then  try  to  find  an  oracle  for  which  U  efficiently  solves  the  corresponding  oracle 
identification  problem.  Next  we  need  to  define  a  state  | tpa)  that  can  be  prepared  with 
0(1)  oracle  calls  and  has  0(1)  overlap  with  W\a).  This  is  accomplished  by  letting 
|  <pa)  be  a  state  of  the  form  2_n/2  ±|tc).  We  can  prepare  |  (pa)  with  only  two  oracle 

calls  (or  one,  depending  on  the  model),  but  to  guarantee  that  \(a\U\pa)\  can  be  made 
large,  we  will  need  an  additional  condition  on  U.  For  any  a  6  A,  W\a)  should  have 
amplitude  that  is  mostly  spread  out  over  the  entire  computational  basis.  When  this  is 
the  case,  we  say  that  U  is  dispersing.  The  precise  definition  is  as  follows: 

Definition  7.  Let  U  be  a  quantum  circuit  on  n  qubits.  For  0  <  a,  (3  <  1,  we  say  that  U 
is  ( a ,  /3)-dispersing  if  there  exists  a  set  A  C  {0,  l}n  with  /l  >  2an  and 

\(a\U\x)\>02*.  (3) 

zelo,!}71 


for  all  a  G  A. 

Note  that  the  LHS  of  (3)  can  also  be  interpreted  as  the  L\  norm  of  W\a). 

The  speedup  in  [BV97]  uses  U  =  fT®n,  which  is  (l,l)-dispersing  since 
\(a\H®n\x)\  =  2n/2  for  all  a.  Similarly  the  QFT  over  the  cyclic  group  is  (1,1)- 
dispersing.3  Nonabelian  QFTs  do  not  necessarily  have  the  same  strong  dispersing  prop¬ 
erties,  but  they  satisfy  a  weaker  definition  that  is  still  sufficient  for  a  quantum  speedup. 
Suppose  that  the  measurement  operator  is  instead  defined  as  TIa  —  U(\a){a\  <E>  F)U\ 
where  a  is  a  string  on  m  bits  and  /  denotes  the  identity  operator  on  n  —  m  bits.  Then  U 
still  permits  oracle  identification,  but  our  requirements  that  U  be  dispersing  are  now  re¬ 
laxed.  Here,  we  give  a  definition  that  is  loose  enough  for  our  purposes,  although  further 
weakening  would  still  be  possible. 

Definition  8.  Let  U  be  a  quantum  circuit  on  n  qubits.  For  0  <  a,  /3  <  1  and  0  <  m  < 
n,  we  say  that  U  is  (a.  B)-pseudo-dispersing  if  there  exists  a  set  A  C  {0,  l}m  with 
A|  >  2“n  such  that  for  all  a  G  A  there  exists  a  unit  vector  \ijj)  e  C2  such  that 

T  \(aMU\x)\>P2%.  (4) 

rcelO,!}" 

This  is  a  weaker  property  than  being  dispersing,  meaning  that  any  (ct,  ,3)-dispersing 
circuit  is  also  (ct,  /3)-pseudo-dispersing. 

We  can  now  state  our  basic  constant  vs.  linear  query  separation. 

Theorem  9.  If  U  is  (a,  3)-pseudo-dispersing,  then  there  exists  an  oracle  problem 
which  can  be  solved  with  one  query,  one  use  ofU  and  success  probability  ( 2/3)tt )2. 
However,  any  classical  randomized  algorithm  that  succeeds  with  probability  >  5  must 
use  >  an  +  log  5  queries. 

3  Another  possible  way  to  generalize  [BV97]  is  to  consider  other  unitaries  of  the  form  U  — 
A®71,  for  A  <G  U2-  However,  it  is  not  hard  to  show  that  the  only  way  for  such  a  U  to  be 
(12(1),  12(1)) -dispersing  is  for  A  to  be  of  the  form  e%<^iaz He^2<T* . 
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Before  we  prove  this  Theorem,  we  state  a  Lemma  about  how  well  states  of  the  form 
2~n/2  ^  ei(j)x  can  ke  approximated  by  states  of  the  form  2_n/2  Y ]x  ±|:r). 

Lemma  10.  For  any  vector  (aq, . . . ,  Xd)  G  Cd  there  exists  (9\, ... ,  Od )  G  {±l}d  such 
that 

d 

k=  1 


7T  z — ' 


A;=l 


The  proof  is  in  the  full  version  of  the  paper[HH08]. 

Proof  of  Theorem  9:  Since  U  is  (a,  /3)-pseudo-dispersing,  there  exists  a  set  A  C 
{0,  l}m  with  | A |  >  2an  and  satisfying  (4)  for  each  a  G  A.  The  problem  will  be  to 
determine  a  by  querying  an  oracle  Oa  (x).  No  matter  how  we  define  the  oracle,  as  long 
as  it  returns  only  one  bit  per  call  any  classical  randomized  algorithm  making  q  queries 
can  have  success  probability  no  greater  than  2q~an  (or  else  guessing  could  succeed 
with  probability  >  2~an  without  making  any  queries).  This  implies  the  classical  lower 
bound. 

Given  a  G  A,  to  define  the  oracle  Oa,  first  use  the  definition  to  choose  a  state  \ip) 
satisfying  (4).  Then  by  Lemma  10  (below),  choose  a  vector  6  that  (when  normalized 
to  1 9))  will  approximate  the  state  W\a)\ijj).  Define  Oa(x )  so  that  (— =  Qx  = 
2n/2(x| 9).  By  construction, 


2-"/2|{a|WC/|9)|>-/J  (5) 

7 r 

which  implies  that  creating  1 9),  applying  U,  and  measuring  the  first  register  has  proba¬ 
bility  >  (2/3 /n)2  of  yielding  the  correct  answer  a.  □ 

5  Any  Quantum  Fourier  Transform  Is  Pseudo-dispersing 

In  this  section  we  start  with  some  special  cases  of  dispersing  circuits  by  showing  that 
any  Fourier  transform  is  dispersing.  In  the  next  section  we  show  that  most  circuits  are 
dispersing. 

The  original  RFS  paper  [BV97]  used  the  fact  that  H®n  is  (l,l)-dispersing  to  obtain 
their  starting  0(1)  vs  J 2{n)  separation.  The  QFT  on  the  cyclic  group  (or  any  abelian 
group,  in  fact)  is  also  (l,l)-dispersing.  In  fact,  if  we  will  accept  a  pseudo-dispersing 
circuit,  then  any  QFT  will  work: 

Theorem  11.  Let  G  be  a  group  with  irreps  G  and  d\  denoting  the  dimension  of  ir- 
rep  A.  Then  the  Fourier  transform  over  G  is  (a,  l/y/2 )-pseudo-dispersing,  where  a  — 
(losEArfA)/log|G|  >  1/2. 

Via  Theorem  9  and  Theorem  2,  this  implies  that  any  QFT  can  be  used  to  obtain  a 
superpolynomial  quantum  speedup.  For  most  nonabelian  QFTs,  this  is  the  first  example 
of  a  problem  which  they  can  solve  more  quickly  than  a  classical  computer. 


Proof  (Proof  of  Theorem  11).  Let  A  =  {(A,  i)  :  A  G  G,  i  G  (1, . . . ,  d\}}. 
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Let  V\  denote  the  representation  space  corresponding  to  an  irrep  A  e  G.  The  Fourier 
transform  on  G  maps  vectors  in  C[G]  to  superpositions  of  vectors  of  the  form 

|A)|vi)|u2)  for  |ui),  \v2)  G  V\. 

Fix  a  particular  choice  of  A  and  |z)  e  V\.\fU  denotes  the  QFT  on  G  then  let 

P  =  U f  ^|A)<A|0|i)(i|0^  U. 

Define  V  :=  supp  p,  and  let  E|^\ey  denote  an  expectation  over  | ijj)  chosen  uniformly 
at  random  from  unit  vectors  in  V4  Finally,  let  Id  be  the  projector  onto  V.  Note  that 

p  =  n/dx  =  ElV’X# 

Because  of  the  invariance  of  p  under  right-multiplication  by  group  elements  (i.e. 


(gi\p\g2)  =  {gih\p\g2h)  for  all  gt,  g2,h  e  G),  we  have  for  any  g  that 

(g\p\g)  =  |^|  ^2{gh\p\gh)  =  y^  tr(p)  =  y^.  (6) 

Since  E  \ip)(ip\  =  p,  (6)  implies  that 

E  \{g\^)\2  =  ( \g\p\g )  = 

I  i>)ev  |Cr| 

Next,  we  would  like  to  analyze  E  |  (g \ip)  |4. 

E  \(g\i/j)\4  =  E  tr  {\g)(g\  0  \g)(g\)  •  (IV’XV’I  0  IXXV’I)  (7) 

m  m 

=  tr  (|fl)<g|  0  |g><g|)  (g  ®  n)  (8) 

<  tr  (\g){g\  0  \g){g\)  ■  (I  +  swap) (p  0  p)  (9) 

=  2({9|p|j»2  =  (10) 


To  prove  the  equality  on  the  second  line,  we  use  a  standard  representation-theoretic 
trick  (cf.  section  V.B  of  [PSW06]).  First  note  that  [ip)9'* 1  belongs  to  the  symmetric  sub¬ 
space  of  V  0  V,  which  is  a  dx(-dx+i)  -dimensional  irrep  of  Udx  ■  Since  E^  |z/>)  (t/>|  2  is 
invariant  under  conjugation  by  u  0  u  for  any  u  G  Udx,  it  follows  that  Ei^  \ip)  (V’l^2  is 
proportional  to  a  projector  onto  the  symmetric  subspace  of  V®2 .  Finally,  SWAP f702  has 
eigenvalue  1  on  the  symmetric  subspace  of  L 02  and  eigenvalue  —1  on  its  orthogonal 
complement,  the  antisymmetric  subspace  of  V ®2.  Thus,  J+S2WAP  7702  projects  onto  the 
symmetric  subspace  and  we  conclude  that 

E  =  1 . 

W  d\(d\  +  1) 

4  We  can  think  of  |i/')  either  as  the  result  of  applying  a  Haar  uniform  unitary  to  a  fixed  unit 
vector,  or  by  choosing  | ip')  from  any  rotationally  invariant  ensemble  (e.g.  choosing  the  real 
and  imaginary  part  of  each  component  to  be  an  i.i.d.  Gaussian  with  mean  zero)  and  setting 

I  v>)  =  W)/xfWW). 
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Now  we  note  the  inequality 


E|y|  >  (Ey2)t/(Ey4)5, 


(ii) 


which  holds  for  any  random  variable  Y  and  can  be  proved  using  Holder’s  inequal¬ 
ity  [Ber97].  Setting  Y  =  \(g\^)\,  we  can  bound  E^  \(g\i>)\  >  l/y/2|Gj.  Summing 
over  G,  we  find 


Finally,  because  this  last  inequality  holds  in  expectation,  it  must  also  hold  for  at  least 
some  choice  of  \ijj).  Thus  there  exists  \ijj)  e  V  such  that 


Then  U  satisfies  the  pseudo-dispersing  condition  in  (4)  for  the  state  \f>)  with  (3  =  1/ a/2. 

This  construction  works  for  each  A  e  G  and  for  |tq)  running  over  any  choice  of 
basis  of  V\.  Together,  this  comprises  J2\eG  vectors  in  the  set  A. 

6  Most  Circuits  Are  Dispersing 

Our  final,  and  most  general,  method  of  constructing  dispersing  circuits  is  simply  to 
choose  a  polynomial- size  random  circuit.  We  define  a  length-f  random  circuit  to  consist 
of  performing  the  following  steps  t  times. 

1.  Choose  two  distinct  qubits  i,  j  at  random  from  [n\. 

2.  Choose  a  Haar-distributed  random  U  e  U\ . 

3.  Apply  U  to  qubits  i  and  j. 

A  similar  model  of  random  circuits  was  considered  in  [DOP07].  Our  main  result  about 
these  random  circuits  is  the  following  Theorem. 

Theorem  12.  For  any  a,  (3  >  0,  there  exists  a  constant  C  such  that  if  U  is  a  random 
circuit  on  n  qubits  of  length  t  —  Cn3  then  U  is  (a,  (3)-dispersing  with  probability 


2(32 


1  _  2 -n(l-a)  ' 


Theorem  12  is  proved  in  the  extended  version  of  this  paper[HH08].  The  idea  of  the 
proof  is  to  reduce  the  evolution  of  the  fourth  moments  of  the  random  circuit  (i.e.  quan¬ 
tities  of  the  form  E u  tr  U M i  U ^  U M:i U t  M4 )  to  a  classical  Markov  chain,  using  the 
approach  of  [DOP07].  Then  we  show  that  this  Markov  chain  has  a  gap  of  17(1  /n2),  so 
that  circuits  of  length  0(n3)  have  fourth  moments  nearly  identical  to  those  of  Haar- 
uniform  unitaries  from  U2™.  Finally,  we  use  (11),  just  as  we  did  for  quantum  Fourier 
transforms,  to  show  that  a  large  fraction  of  inputs  are  likely  to  be  mapped  to  states  with 
large  Li-norm.  This  will  prove  Theorem  12  and  show  that  superpolynomial  quantum 
speedups  can  be  built  by  plugging  almost  any  circuit  into  the  recursive  framework  we 
describe  in  Section  3. 
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A  key  component  of  quantum  algorithms  is  their  ability 
to  reveal  information  stored  in  nonlocal  degrees  of  free¬ 
dom.  In  particular,  one  of  the  most  important  building 
blocks  known  is  the  quantum  Fourier  transform  (QFT) 
[  1  ],  an  efficient  circuit  construction  for  conversion  between 
discrete  position  and  momentum  bases.  The  QFT  converts 
a  vector  of  2"  amplitudes  in  0{n2)  steps,  in  contrast  to  the 
0{n2n)  steps  required  classically. 

Another  elementary  basis  change  important  in  quantum 
physics  is  between  independent  local  states  and  those  of 
definite  total  generalized  angular  momentum.  When  two 
identical  spin- 1/2  particles  interact  with  a  global  excita¬ 
tion,  due  to  their  permutation  symmetry  they  appear  as  a 
singlet  or  a  triplet  to  the  external  interaction.  When  this 
basis  is  generalized  to  n  ^/-dimensional  systems  (n  “qu- 
dits”),  we  call  it  the  Schur  basis  and  call  the  unitary 
transformation  between  local  and  Schur  bases  the  Schur 
transform. 

The  Schur  transform  is  central  to  a  plethora  of  quantum 
information  protocols  and  to  many  optimal  physical  meth¬ 
ods  for  extracting  information  or  resources  from  a  quantum 
system.  These  include  methods  to  estimate  the  spectrum  of 
a  density  operator  [2],  perform  quantum  hypothesis  testing 
[3],  perform  universal  quantum  source  coding  [4],  concen¬ 
trate  entanglement  noiselessly  [5],  create  states  immune  to 
collective  decoherence  [6],  and  communicate  without  a 
shared  reference  frame  [7].  For  all  of  these  tasks  (and 
others),  inefficient  protocols  also  exist  that  work  in  local 
bases;  however,  only  the  protocols  using  the  Schur  basis 
are  optimal.  This  suggests  that  the  Schur  basis  is  a  natural 


way  to  treat  quantum  states  based  on  independent  and 
identically  distributed  random  variables,  i.e.,  to  experi¬ 
ments  in  which  many  copies  of  a  single  quantum  state 
are  given.  However,  unlike  the  QFT,  no  efficient  algorithm 
for  the  Schur  transform  has  been  found,  rendering  proto¬ 
cols  which  use  it  nonconstructive.  If  we  wish  to  implement 
the  Schur  transform  in  the  lab  to  solve  any  of  the  problems 
listed  above,  an  explicit  efficient  circuit  construction  for 
the  Schur  transform  is  needed. 

Here,  we  resolve  this  problem  by  giving  an  efficient 
construction  of  the  Schur  transform  on  n  qudits,  for  arbi¬ 
trary  n  and  d.  This  is  achieved  using  a  quantum  circuit  of 
size  poly[«,  d,  log(l/e)]  for  accuracy  e.  We  believe  that 
this  basis  change  is  important  not  only  for  quantum  infor¬ 
mation  and  useful  for  extracting  information  about  physi¬ 
cal  systems,  but  also  as  a  new  building  block  for  future 
quantum  algorithms. 

The  Schur  transform . — Consider  a  system  of  n  qudits, 
each  with  a  standard  local  (“computational”)  basis  |/),  i  = 
1 . . .  d.  The  Schur  transform  relates  transforms  on  the 
system  performed  by  local  d-dimensional  unitary  opera¬ 
tions  to  those  performed  by  permutation  of  the  qudits. 
Recall  that  the  symmetric  group  S„  is  the  group  of  all 
permutations  of  n  objects.  This  group  is  naturally  repre¬ 
sented  in  our  system  by 

P  W)\hk  ■  ■  •  in)  =  Un-\l)i^-\2)  •  •  •  i*-\n))>  C1) 

where  tt  £  Sn  is  a  permutation  and  |  i1  i2  . . .)  is  shorthand 

for  lij)  ®  |/2)  ® _ Let  11d  denote  the  group  of  d  X  d 

unitary  operators.  This  group  is  naturally  represented  in 
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our  system  by 

Q  {U)\fi2  ■  ■  ■  in)  =  U\h)  ®  U\i2)  ®  ■  ■  ■  ®  U\in),  (2) 
where  U  G  *LLd. 

The  Schur  transform  is  based  on  Schur  duality,  a  well- 
known  [8]  and  powerful  way  to  relate  the  representation 
theory  of  P(  tt)  and  Q((7).  For  example,  consider  the  case 
of  two  qubits  (n  =  2,  d  =  2).  The  two-qubit  Hilbert  space 
(C2)®2  decomposes  under  Q  into  a  one-dimensional  spin-0 
singlet  space  spanned  by  ^(|01)  —  1 10))  and  a  three- 
dimensional  spin-1  triplet  space  spanned  by  |00),  1 1 1), 
and  -^(|01)  +  1 10)).  Both  of  these  spaces  are  acted  upon 
in  an  irreducible  manner  under  the  action  of  Q(U),  U  G 
'LL,  meaning  that  the  action  of  Q(U)  does  not  mix  these 
two  subspaces  and  these  are  the  minimal  such  nonmixing 
subspaces  which  exist.  Schur  duality  is  related  to  the  fact 
that  these  subspaces  also  happen  to  be  irreducible  repre¬ 
sentations  (irreps)  of  <S2.  The  singlet  state  changes  sign 
under  permutation  of  the  two  spins,  and  the  triplet  states 
are  invariant  under  permutation.  These  correspond  to  the 
sign  Ps ign  and  the  trivial  Ptriv iai  irreps  of  S2,  and  thus  we 
can  write  (C2)®2  =  (£i  ®  Jovial)  ©  (£ o  ®  J’sign),  where 
is  the  spin-y  irrep  of  Ti2. 

This  relation  between  the  two  representations  exists  for 
an  arbitrary  number  of  qudits,  and  in  general  both  the  rild 
and  S„  irreps  will  be  nontrivial.  For  example,  the  Hilbert 
space  of  three  qubits  (n  =  3,  d  —  2)  decomposes  into 
(  0.3/2  ®  ^trivial)  ©  (Q.i/2  ®  'Pi, i)>  where  P2 i  denotes  a 
particular  two-dimensional  mixed  symmetry  irrep  of  Sj . 
In  terms  of  the  original  (local)  basis  the  i/2  ®  ?2,i  space 
contains  two  spin- 1/2  objects,  one  spanned  by  1 1 10)  + 
<w|011)  +  <n*|101)  (suppressing  normalization)  and 
|001)  +  co 1 1 00)  +  cu*|010),  and  the  other  obtained  by  re¬ 
placing  co  =  e2m^  with  co*.  These  two  spaces  correspond 
to  the  two  dimensions  of  ?21. 

The  general  theorem  of  Schur  duality  states  that  for  any 
(integer)  d  and  n, 

(C  r=  0  Q.k®P„  (3) 

AG  Part  [n,d] 

where  A  is  chosen  from  the  set  of  possible  partitions  of  n 
into  <  d  parts,  and  simultaneously  labels  the  Ti, /-irrep  Q_A 
and  the  <S„  -irrep  PA.  This  goes  beyond  simultaneously 
diagonalizing  the  commuting  representations  P  and  Q 
because  PA  depends  only  on  n  (through  A)  and  not  cl. 
Schur  duality  means  that  there  exists  a  basis  for  (<Cf/)®" 
with  states  |A,  qA,  /?A)Sch,  where  A  labels  the  subspaces 
Qx  ®  P x  and  |gA)  G  Q_A  and  \pA)  G  PA  label  bases  for 
<2  A  and  PA,  respectively. 

Just  as  in  the  examples  above,  the  Schur  basis  states 
I  A,  qA,  pA)sch  are  superpositions  of  the  n  qudit  computa¬ 
tional  basis  states  \iii2  . . .  in), 

U.  d\’  P\)sch  =  ^  [Uschlfj’J^yJbL  ■  ■  ■  in)-  (4) 

<1 . 


By  the  isomorphism  of  Eq.  (3),  this  defines  a  unitary 
transformation  Usch  (with  matrix  elements  as  given),  the 
Schur  transform  we  desire.  If  we  think  of  Usch  as  a  quan¬ 
tum  circuit,  it  maps  the  state  |A,  c/A,  pA) sch  into  the  compu¬ 
tational  basis  state  |  A,  qx,  px),  with  A,  qA,  and  px  expressed 
as  bit  strings.  Since  dim( Q_x)  and  dim{PA)  vary  with  A  we 
need  to  pad  the  \q)  and  |/;)  registers;  this  requires  only 
constant  spatial  overhead.  We  know  of  no  efficient  classi¬ 
cal  algorithms  to  calculate  even  a  single  matrix  element  of 
USch,  the  best  known  results  being  recursive  definitions  of 
these  matrix  elements  which  require  exponential  time  to 
evaluate  [9].  The  main  purpose  of  this  Letter  is  to  show 
how  the  entire  transformation  can  be  performed  on  a 
quantum  computer  in  polyriz,  cl)  steps  {implying  as  a  cor¬ 
ollary  a  classical  algorithm  for  Schur  transforming  a  vector 
of  length  cin  in  time  0[<r/" poly («,  <:/)]}. 

The  defining  property  of  USch  is  that  it  reduces  the  action 
of  Q  and  P  into  irreps.  For  any  n  G  S„  and  any  U  G  Uch 
P(tt)  and  Q (U)  commute,  so  we  can  express  both  reduc¬ 
tions  at  once  as 


U  SchQ(L/)P(77)Us+ch  =  £  I  A><AI  ®  Ia(^)  ®  Pa(^), 

AG  Part (d,n) 

(5) 

where  qA  and  pA  are  irreps  of  T Ld  and  <S„,  respectively. 

Example  of  the  Schur  transform. — Consider  the  case  of 
two  qubits  ( n  =  2,  d  =  2).  Here  the  Schur  transform  is  the 
transform  between  the  standard  computational  basis  |zj,  if) 
and  a  basis  describing  the  singlet  and  triplet  states. 
Explicitly  the  matrix  of  elements  for  the  Schur  transform, 
as  in  Eq.  (4),  are  given  by 


|A  =  (L  1),  qA  =  0,  pA  =  0)Sch 

I A  =  (2,  0),  qx  =  +l,pA  =  0)Sch 
I A  =  (2,  0),  qA  =  0,  pA  =  0)Sch 
|  A  =  (2,  0),  qA  —  1,  p A  —  0)Sch 


|00> 

101) 

|10> 

no 

r°  a 

1 

72 

\ 

0“ 

i 

0 

0 

0 

0  Ti 

1 

72 

0 

L° 

0 

0 

1 J 

(6) 


Here  A  =  (1,  1)  labels  the  singlet  and  A  =  (2,  0)  labels  the 
triplet.  In  this  simple  case,  the  permutation  irreps  are  both 
one  dimensional.  Further  as  noted  above,  when  we  imple¬ 
ment  this  we  must  express  the  label  A,  pA,  qA  in  terms  of  bit 
strings  from  some  computational  basis.  For  example,  we 
could  label  A  by  a  single  qubit  and  qA  by  two  qubits  (no 
qubits  are  required  for  pA  in  this  example). 

Applications  of  the  Schur  transform. — The  numerous 
applications  of  the  Schur  transform  mentioned  in  the  in¬ 
troduction  [2-7]  solve  a  variety  of  problems  which  are 
relevant  to  quantum  information  theory  as  well  as  to  ex¬ 
periments  designed  to  acquire  information  or  resources 
from  a  quantum  system.  Applying  the  Schur  transform 
extracts  A,  q,  and  p  values  for  a  given  state,  allowing  the 
values  be  manipulated  like  any  other  quantum  data.  Here 
we  briefly  review  a  few  of  these  applications,  focusing  on 
the  ones  most  relevant  to  physics. 
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One  application  of  the  Schur  transform  is  spectrum 
estimation  [2].  In  spectrum  estimation,  we  are  given  access 
to  n  copies  of  a  density  operator  p  =  Y.iPi :lt)(*l-  Suppose 
the  experimentalist  wishes  to  estimate  the  values  of  the 
eigenvalues  pt  but  does  not  know  the  basis  |  /).  Using  the 
Schur  transform  is  the  optimal  method  for  estimating  this 
spectrum  for  any  value  of  n.  In  particular,  if  we  are  given 
/>*",  then  performing  the  Schur  transform  on  this  state 
followed  by  measuring  the  irrep  label  A  provides  an  esti¬ 
mate  of  the  spectrum  by  taking  the  partition  A  = 
(A1;  A2, . . Ad)  and  dividing  each  A,  by  n:  p,  ~ 
In  the  limit  of  large  n  this  estimate  is 

optimal  [2]. 

Consider,  for  example,  spectrum  estimation  for  the  n  = 
2,  d  =  2  example  given  above.  Let  p  =  p|0)(0|  +  (1  — 
p)|l)(l|  be  a  fixed  state  and  assume  the  experimentalist 
does  not  know  the  basis  |0),  |1).  In  order  to  estimate  this 
spectrum  if  we  are  given  two  copies  of  p,  we  perform  the 
Schur  transform,  Eq.  (6),  on  these  two  qubits  and  measure 
the  A  register.  If  we  get  A  =  (2,  0)  we  estimate  that  the 
spectrum  is  that  of  a  pure  state  px  =  1,  p2  =  0  and  if  we 
get  A  =  (1,  1)  we  estimate  that  the  spectrum  is  that  of  the 
fully  mixed  state  p1  =  p2  =  For  a  given  value  of  p, 
the  A  =  (2,  0)  case  occurs  with  probability  1  —  p{\  —  p), 
and  the  A  =  (1,  1)  case  occurs  with  probability  p(  1  —  p). 
Note  that  only  if  p  =  0  or  1  does  this  estimate  exactly 
reproduce  the  spectrum.  Hence  we  learn  a  little  about  the 
spectrum  with  two  copies  of  p\  note  that  what  we  have 
learned  is  independent  of  the  basis  |0),  1 1).  In  the  limit  of  a 
large  number  of  copies,  n  »  1,  the  Schur  transform  pro¬ 
vides  the  optimal  estimate  of  the  spectrum. 

Another  application  of  the  Schur  transform  is  to  encode 
quantum  information  into  noiseless  subsystems  which 
arise  due  to  collective  decoherence  [6].  Here  we  run  the 
Schur  transform  backwards  (this  can  be  done  for  the  circuit 
by  applying  the  inverse  of  every  gate  and  reversing  the 
order  of  the  gates).  If  we  input  into  the  inverse  Schur 
transform  a  fixed  label  |A),  some  arbitrary  information  in 
the  |gA)  register,  and  the  information  we  wish  to  encode  in 
a  noiseless  manner  into  the  \pf)  basis,  then  the  n  qudit 
states  output  from  this  transform  are  encoded  in  a  noiseless 
manner.  In  particular,  the  effect  of  decoherence  which 
couples  identically  to  each  of  the  n  qudit  states  acts  trivi¬ 
ally  on  the  encoded  information.  Noiseless  subsystems 
have  already  been  implemented  in  ion  trap  quantum  com¬ 
puters  [10]  and  our  transform  makes  feasible  their  use  for 
larger  systems. 

As  an  example  of  the  Schur  transform  in  quantum 
information  theory,  consider  the  situation  where  Alice 
and  Bob  share  n  copies  of  a  partially  entangled  state 
|i f/)AB  =  Y.i  \fPi  IOaIOb  and  they  wish  to  extract  the  maxi¬ 
mal  number  of  maximally  entangled  states,  -j=X 

cl  V de 

Y,k=  i  I i),\  I ')/< .  from  these  n  copies.  Alice  and  Bob’s  local 
density  matrices  are  invariant  under  permutations  of  their  n 
copies,  so  if  they  perform  the  Schur  transform  and  measure 
the  |  A)  basis,  this  leaves  their  \pf)  registers  in  a  maximally 


entangled  state.  If  |  ip)  is  unknown  and  no  classical  com¬ 
munication  is  allowed,  then  this  is  an  optimal  distortion- 
free  entanglement  protocol  [5].  Note  that  in  order  to  make 
this  protocol  computationally  tractable,  we  need  to  de¬ 
scribe  how  the  | p \)  basis  states  are  labeled  in  a  way  that 
can  be  efficiently  and  reversibly  mapped  to  the  integers 
{1,  ■  - dim(J,A)}  [11]. 

Quantum  circuit  for  the  Schur  transform.  — We  construct 
a  quantum  circuit  [12]  for  USch  in  two  stages,  first  for  d  = 
2,  then  generalizing  to  d  >  2.  Each  of  these  constructions 
follows  an  iterative  structure,  in  which  the  Schur  transform 
on  n  qudits  is  realized  using  n  elementary  steps,  each  of 
which  adds  a  single  qudit  to  an  existing  Schur  state  of  the 
form  |  A,  q,  p). 

For  d  =  2,  this  elementary  step  corresponds  to  the  addi¬ 
tion  of  angular  momentum,  and  the  matrix  elements  of  the 
unitary  transform  are  known  as  Clebsch-Gordan  (CG)  co¬ 
efficients  [13].  In  this  case,  A  and  q  can  be  conveniently 
denoted  by  half  integers  j  and  m  (with  \m\  <  j  <  n/2) 
which  give  the  total  angular  momentum  and  the 
z-component  of  angular  momentum,  respectively.  And  in 
terms  of  j,  the  CG  transform  takes  as  input  \j,  m)  and  a 
single  spin  |s  =  ±1/2),  and  outputs  a  linear  combination 
of  the  states  | /  =  j  ±  1/2 ,m'  =  m  +  s).  The  amplitudes 
of  the  linear  combination  are  readily  computed  using  the 
usual  ladder  operators  for  raising  and  lowering  angular 
momenta  [13].  In  addition,  however,  we  must  distinguish 
between  multiple  distinct  pathways  which  add  up  to  give 
the  same  total  j,  as  demonstrated  by  the  three  qubit  ex¬ 
ample  above.  In  fact,  it  is  the  permutation  symmetry  of 
these  pathways  which  gives  rise  to  Tr  and  thus  we  track 
the  pathway  with  another  output  label  p  =  /  —  j. 

Putting  this  together,  we  can  define  an  elementary 
Clebsch-Gordan  transform  step  UGG  as  a  rotation  between 
two  specific  basis  states, 


|  /_,  m', 

P  = 

4)' 

cos  djmi 

~  sind,v 

1  /+,  m', 

P  = 

+  2>  _ 

sin  0j,m< 

cos  6  jmi 

I  j,  m+)\s  =  -/) 
I  j,  m-)\s  =  +|) 


(7) 


where  j'+  =  j  ±  1/2,  m±  =  ml  ±  1/2,  and  cos  0j  mi  = 

■\Jj+2j+i^-  UcG  can  be  realized  with  three  gates  in  a 
quantum  circuit,  as  shown  in  Fig.  1,  using  as  one  gate  a 
controlled  rotation  about  y  by  angle  6jmi.  This  angle  is 
computed  using  usual  quantum  and  reversible  circuit  tech¬ 
niques  [12]  with  error  e,  using  poly[log(l/e)]  standard 
circuit  elements. 

The  full  Schur  transform  is  implemented  by  cascading 
UCG  as  shown  in  Fig.  2.  The  complexity  of  this  circuit  is 
thus  0(npoly  log(l/e)).  We  now  claim  that 
| Pi, . . pn)  '•=  I p)  labels  a  basis  for  Tj.  This  follows 
from  Eq.  (3)  and  the  fact  that  the  pk  =  jk  —  jk-l,  k  = 
l, ....  n  are  invariant  under  Q,  while  j  and  m  are  invariant 
under  P.  In  fact,  since  jk  describes  the  action  of  Sk  on  the 
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FIG.  1.  Quantum  circuit  implementing  UCG  to  convert  be¬ 
tween  the  |  j,  m)\s)  and  | /,  in',  p)  bases,  for  the  d  =  2  (qubit) 
case.  Following  standard  conventions  [12],  time  goes  from  left  to 
right,  the  [/)  and  | m)  wires  hold  multiple  qubits,  and  |s)  is  one 
qubit.  The  controlled  X  operation  Cx  adds  the  control  to  the 
target  qubits,  i.e.,  Cx|i)|w)  =  \s)\m  +  s).  The  doubly  controlled 
Ry(6j  mf)  gate  implements  the  rotation  given  by  Eq.  (7)  using  the 
j  and  m'  qubits. 


first  k  qubits,  \px, pn)  is  a  subgroup-adapted  basis  for 
the  chain  C  52  C  . . .  C  S„,  also  known  as  Young’s 
orthogonal  basis  [14].  This  basis  is  also  used  in  the  only 
known  fast  quantum  Fourier  transform  over  <S„  [14,15]. 

Construction  of  the  Schur  transform  for  d  >  2  follows 
the  same  ideas  as  for  d  =  2,  but  is  complicated  by  the 
challenge  of  showing  that  the  elementary  UCG  steps  for 
d  >  2  can  be  computed  in  poly(rf)  steps  [a  direct  construc¬ 
tion  along  the  lines  of  Eq.  (7)  would  require  n° ^  steps]. 
USch  is  constructed  as  a  cascade  of  Oin)  Ucg  transforms, 
just  as  for  d  —  2.  Each  UCG  combines  a  state  |A,  c/A)  [with 
A  G  Part (d,  k  —  1)  and  | qA)  G  1?A]  with  a  single  qudit 
state  1 4),  to  obtain  a  superposition  of  states  |  A',  q',) 
[with  A'  G  Part(r/,  k)  and  \q'A,)  G  Q_y  ].  Simultaneously, 
the  permutation  labels  | p)  are  constructed;  equivalently, 
we  could  save  the  values  of  A  that  we  generate  in  each  step, 
just  as  pi, pn  are  equivalent  to  /j , . . . ,  jn  for  d  =  2. 
UCG  can  be  computed  efficiently  because  of  a  recursive 
relationship  between  UCG  for  fjd  X  11  d  and  that  of 
x  Ud-y  in  terms  of  reduced  Wigner  coefficients 
[16].  Crucially,  there  is  an  efficient  classical  algorithm 
for  the  computation  of  the  reduced  Wigner  coefficients 

[9]  needed  for  UCG.  Specific  details  of  this  calculation 
are  given  in  detail  elsewhere  [11].  The  complexity  of  the 
full  Schur  transform  is  thus  found  to  be  polynomial  in  n,  d, 
and  log(e_1). 

Conclusion. — We  have  shown  how  to  efficiently  per¬ 
form  the  Schur  transform.  Without  efficient  implementa¬ 
tions  of  the  Schur  transform,  the  various  physical  and 
quantum  information  tasks  we  have  discussed  [2-7]  are 
not  practical  in  the  lab.  As  a  final  note,  we  comment  on  the 
Schur  transform  as  it  relates  to  the  search  for  new  quantum 
algorithms.  An  important  open  problem  here  is  to  find  a 
black-box  problem  for  which  the  Schur  transform  offers  a 
speedup  over  classical  algorithms.  In  this  respect,  there  are 
few  unitary  transforms  which  have  both  an  efficient  quan¬ 
tum  circuit  and  interpretations  which  might  allow  these 
transforms  to  be  useful  in  an  algorithm.  We  are  hopeful  that 
our  circuits  will  be  useful  for  quantum  algorithms  exactly 


FIG.  2.  Quantum  circuit  for  the  Schur  transformation  USch, 
transforming  between  |  i ,  i2  ■  ■  ■  in)  and  \j,m,p).  The  fact  that 
the  \pi,  p2,  ■  ■  . .  pn)  is  a  full  basis  is  intimately  related  to  Schur 
duality. 


because  they  have  such  clear  group  representation  theory 
interpretations. 
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We  give  a  simple  recipe  for  translating  walks  on  Cayley  graphs  of  a  group  G  into  a 
quantum  operation  on  any  irrep  of  G.  Most  properties  of  the  classical  walk  carry  over 
to  the  quantum  operation:  degree  becomes  the  number  of  Kraus  operators,  the  spectral 
gap  lower-bounds  the  gap  of  the  quantum  operation  (viewed  as  a  linear  map  on  density 
matrices),  and  the  quantum  operation  is  efficient  whenever  the  classical  walk  and  the 
quantum  Fourier  transform  on  G  are  efficient.  This  means  that  using  classical  constant- 
degree  constant-gap  families  of  Cayley  expander  graphs  on  groups  such  as  the  symmetric 
group,  we  can  construct  efficient  families  of  quantum  expanders. 

Communicated  by:  R  Cleve  &  B  Terhal 


1  Background 

Classical  expanders  can  be  defined  in  either  combinatorial  or  spectral  terms,  while  quantum 
expanders  usually  have  only  a  spectral  definition.  Quantum  expanders  were  introduced  in  [1] 
for  their  application  to  quantum  spin  chains  and  in  [2]  for  applications  to  quantum  statistical 
zero  knowledge.  Here  we  (following  [1]  and  [3])  define  a  (N,D,  A)  quantum  expander  to  be  a 
quantum  operation  £  that 

•  Has  TV-dinrensional  input  and  output. 

•  Has  <  D  Kraus  operators. 

•  Has  second- largest  singular  value  <  A.  Equivalently,  if  £(p)  =  p  and  tv  pa  =  0  then 
||£(cr)||2  <  A||<t||2,  where  ||X||2  :=  VtrXVC. 

We  say  that  N  is  the  dimension  of  the  expander,  D  its  degree  (by  analogy  with  classical 
expanders)  and  1  —  A  its  gap.  Note  that  all  quantum  operations  have  at  least  one  fixed  state 
and  thus  at  least  one  eigenvalue  equal  to  one.  The  above  definition  is  stricter  than  the  one 
in  [2],  which  demanded  only  that  an  expander  increase  the  von  Neumann  entropy  of  a  state 
by  at  most  a  constant  amount.  Finally,  we  say  that  an  expander  is  efficient  (or  “explicit”)  if 
it  can  be  implemented  on  a  quantum  computer  in  time  poly  (log  N).  This  paper  will  describe 
a  new  method  for  constructing  quantum  expanders,  which  will  in  some  cases  yield  efficient 
(N,  0(1),  £1(1))  expanders  for  all  values  of  TV  >  1. 
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2  Previous  work  on  efficient  quantum  expanders 

fn  [4]  it  was  shown  that,  just  as  random  constant-degree  graphs  are  likely  to  be  expander 
graphs,  quantum  operations  that  apply  one  of  a  constant  number  of  random  unitaries  from 
U ( N )  are  likely  to  be  quantum  expanders,  with  spectral  gap  approaching  the  optimal  value 
as  N  — >  oo.  Naturally  such  expanders  cannot  be  efficiently  constructed:  generic  elements  of 
U{N)  require  <d(N2)  gates  to  construct [5],  and  if  we  want  to  produce  the  expander  determin¬ 
istically,  the  only  proposed  method[3,  Sec.  3.3]  does  an  exhaustive  search  over  exp (fi(7V))  dif¬ 
ferent  unitaries.  As  there  are  log  TV  qubits,  this  could  potentially  take  time  doubly-exponential 
in  the  number  of  qubits. 

Prescriptions  for  potentially  efficient  constructions  are  given  in  [1]  and  [2].  Both  begin 
with  classical  expanders  and  turn  them  into  quantum  expanders.  The  proposal  in  [1]  is  to 
start  with  a  so-called  “tensor  power  expander”  and  then  to  add  phases.  A  tensor  product 
expander  is  a  degree  D  graph  (V,  E)  where:  (a)  each  outgoing  edge  is  labelled  1, . . . ,  D,  and 
(b)  if  G'  is  the  graph  with  vertices  VxV  and  edges  given  by  all  pairs  (ei,  e-i)  £  Ex  E  such  that 
ei  and  e2  have  the  same  label,  then  G'  is  an  expander.  Unfortunately,  when  Cayley  graphs 
(see  Section  4  for  definition)  are  labeled  in  the  natural  way  (with  label  g  corresponding  to 
multiplication  by  group  element  g)  they  are  not  tensor  power  expanders.  It  seems  plausible 
that  random  constant-degree  graphs  would  be  tensor  power  expanders,  but  this  has  not  been 
proven. 

The  approach  of  [2]  is,  like  this  paper,  to  turn  classical  Cayley  graph  expanders  into 
quantum  expanders.  Its  main  idea  is  to  apply  a  classical  expander  twice:  first  in  the  standard 
basis,  and  then  conjugated  by  a  sort  of  generalized  Hadamard  transform  (which  they  call  a 
“good  basis  change”),  so  that  it  acts  in  a  conjugate  basis.  Unfortunately,  the  quantum  Fourier 
transform  is  not,  by  itself,  always  enough  to  make  a  good  basis  change.  For  some  groups, 
such  as  SL(2,q),  it  is,  and  thus  [2]  obtain  a  quantum  expander  based  on  the  classical  LPS 
expander  graph.  However,  it  is  unknown  how  to  perform  the  QFT  on  SL( 2,  q)  efficiently  (see 
[6]  for  partial  progress),  and  so  we  do  not  know  how  to  efficiently  perform  the  basis  change 
required  for  their  construction.  On  the  other  hand,  while  there  are  groups  such  as  Sn  for 
which  both  efficient  QFT’s  and  explicit  constant-degree  expanders  are  known,  none  have  yet 
been  proved  to  satisfy  the  additional  property  needed  for  the  QFT  to  be  a  good  basis  change. 

Very  recently,  two  different  constructions  of  efficient,  constant-degree  quantum  expanders 
have  appeared.  The  first  is  described  in[3].  Their  approach  is  to  generalize  the  classical 
zig-zag  product[7]  to  quantum  expanders,  using  a  constant  number  of  random  unitaries[4]  for 
the  base  case.  Like  our  paper,  [3]  also  describes  a  family  of  constant-degree,  constant-gap, 
efficient  expanders.  A  minor  advantage  of  our  construction  is  that  it  can  be  made  to  work 
for  any  dimension  N  >  1,  while  [3]  requires  that  N  be  of  the  form  Dst  for  a  positive  integer 
t  and  that  D  >  Dq  for  a  universal  constant  Dq. 

Another  efficient  constant-degree  expander  is  given  in  [8].  Their  approach  is  to  turn  the 
classical  Margulis  expander[9]  into  an  operation  on  quantum  phase  space.  This  results  in 
quantum  expanders  with  the  same  parameters  as  the  Margulis  expander  (degree  8,  second 
largest  eigenvalue  A  <  2\/5/8)  in  any  dimension,  including  even  infinite  dimensional  systems. 
While  their  paper  only  describes  an  efficient  construction  for  dimensions  of  the  form  N  =  dn 
for  small  d ,  their  approach  is  easily  generalized  to  run  in  time  poly  log  N  for  any  N. 

Finally,  if  we  relax  the  assumption  that  expanders  have  constant  degree,  then  efficient 
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constructions  have  been  described  in  [10,11]. 

3  Representation  theory  notation 

Let  G  be  a  group  (either  finite  or  a  compact  Lie  group),  and  G  a  complete  set  of  inequivalent 
unitary  irreducible  representations  (irreps).  For  an  irrep  X  €  G  and  a  group  element  g  £  G, 
we  denote  the  representation  matrix  by  r\(g),  its  dimension  by  d\  and  the  space  it  acts  upon 
by  V\.  Let  I/qft  be  the  Fourier  transform  on  G ,  corresponding  to  the  isomorphism 

C[G]  =  0Vx8>Va*. 

A 

It  is  given  by  the  explicit  formula  C/qft  =  \/d\/\G\r\(g)ij\\,i,j)(g\.  Let  Lx  := 

IZgeG  \x9)(d\  denote  the  left  multiplication  operator.  Then  in  the  Fourier  basis,  this  translates 
into  action  on  the  first  tensor  factor. 

UqftLxUqFT  =  'y  '  |  A)  (A |  (S>  r\(x)  <S>  I<i\  ■  (1) 

AeG 


4  Expander  construction 

Let  G  be  a  group  with  a  generating  set  FcG.  Define  the  Cayley  graph  (G;  T)  to  have  vertex 
set  G  and  edges  (g,  xg)  for  each  g  £  G  and  each  ieF.  We  will  be  interested  in  the  case  when 
(G;  F)  is  an  expander  graph. 

Choose  any  non-trivial  A  £  G.  Our  quantum  expander  is  defined  as  follows.  Let  £  be  the 
quantum  operation  on  V\  given  by 

£(p)  =  Tf£  (2) 

1  1  ser 

This  operation  acts  on  a  d\  dimensional  space  by  choosing  a  uniformly  random  g  £  T  and 
then  applying  the  (unitary)  representation  matrix  r \(g).  We  will  see  below  ways  in  which 
r x(g)  can  be  implemented  on  a  quantum  computer. 

I  claim  that 


1.  The  degree  of  £  is  <  |F| . 

2.  If  (a)  group  multiplication  in  G  is  efficient,  (b)  there  is  a  procedure  for  efficiently  sam¬ 
pling  from  F,  (c)  the  QFT  on  G  is  efficient  and  (d)  log  |G|  <  poly  (log  d\),  then  £  can 
be  implemented  efficiently. 


3. 


A2(£)  <  A2(Wr). 


(3) 


Here  A2(£)  is  the  second  largest  singular  value  of  5,  when  interpreted  as  a  linear  map  on 
density  matrices,  while  A2(H/r)  is  the  second-largest  singular  value  of  the  Cayley  graph 
transition  matrix: 

1 


Wr  =  ]f[SSi75)<fli- 


76F geG 
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Thus,  classical  Cayley  graph  expanders  give  quantum  expanders. 


Proof  of  claims  (1-3).  The  first  claim  is  immediate.  In  the  second  claim,  we  use  the  fact 
that  v\(g)  can  be  applied  to  \ip)  €  V\  by  performing  the  inverse  QFT  on  |A)|^)|0),  applying 
the  map  \x)  — »•  \gx),  performing  the  QFT  and  keeping  only  the  second  register  (see  [12,  Chap. 
8]  for  details).  Condition  (d)  is  because  we  say  the  QFT  on  G  is  efficient  if  it  runs  in  time 
poly(log  |G|),  but  we  would  like  our  expander  to  run  in  time  poly(logcG)-  Alternatively  (a), 
(c)  and  (d)  can  be  replaced  by  any  other  efficient  procedure  for  performing  rx(g)  on  a  quantum 
computer.  (See  Section  5  for  examples.) 

The  only  non-trivial  claim  above  is  (3).  Assuming  that  F  generates  G,  the  unique  station¬ 
ary  state  of  Wr  is  the  uniform  distribution 


\u) 


1 

7W\ 


Y  is>- 

geG 


We  can  find  the  second  largest  eigenvalue  by  subtracting  off  a  projector  onto  the  stationary 
state  and  taking  the  operator  norm.  Thus 


A2(Wr)  =  ||Wr-|«)H||00,  (4) 

where  HMHqo  is  the  largest  singular  value  of  M. 

Similarly,  the  maximally  mixed  state  r  :=  Idx/Vd\  is  a  stationary  state  of  £.  We  choose 
the  normalization  so  that  r  will  be  a  unit  vector  with  respect  to  the  Hilbert-Sclnnidt  inner 
product  (A,  B)  :=  tr  A'B.  However,  to  analyze  £  as  a  linear  operator,  it  is  simpler  to  think 
of  it  as  acting  on  vectors.  The  corresponding  linear  map  is  denoted  £  and  is  defined  to  be 

£  ~  t4-  ®  rA(7)*’ 

I  I  7er 

where  the  *  denotes  the  entry- wise  complex  conjugate  with  respect  to  a  basis  B\  for  Vx.  Then 
|f)  :=  d~f^2  ]CbGBx  l&)  ®  | b)  is  a  fixed  point  of  £.  Thus 

M{£)  =  \\£  ~  |t)<t|  Hoo-  (6) 

We  now  use  representation  theory  to  analyze  (4)  and  (6).  First,  examine  (4).  Since  C/qft  is 
unitary,  ||Wr-|u)(u|  ||oo  =  WUqftWtU^^-Uqft  |u)(u|  £7qFT||oo-  Since  UqFT\u)  =  [trivial), 
we  can  use  (1)  to  obtain 


A2(Wr)  =  ||Wr-|u)<u|||oo 


1 

W\ 


YY  |A)<A|  (g)  rA(o^)  ®  Idx  —  [trivial)  (trivial| 

7er  AeCj 


OO 


(7) 


Y  ia)(ai ® 

Ter  Xec 
A^trivial 


=  max 

A^trivial 


l 

W\ 


Yrx^ 

7^r 


OO 


(8) 

(9) 


A.  W.  Harrow  719 


A  similar  argument  applies  to  (6)  as  well.  Here  the  first  step  is  to  decompose  V\  (g>  VA*  into 
irreps  of  G.  In  general, 

V\  <S)  Vx  =  0K,®Cm-', 

where  m„  is  the  multiplicity  (possibly  zero)  of  Vv  in  V\  ®  RA  .  Let  Uqg  be  the  unitary 
transform  implementing  the  above  isomorphism.  Then  by  definition, 

Ucg  (r\(g)®rx(g)*)  W)(v\®r„(g)<S>  Im„-  (10) 

We  can  use  this  to  analyze  the  spectrum  of  £.  In  particular 


UcgZUIq  =  Y  \v)(v\  ®  Y r "(7)  )  ®  W  (H) 

uec  V  Uer  J 

From  Schur’s  Lemma,  we  know  that  mtriviai  =  1,  corresponding  to  the  stationary  state  |f). 
Thus 


X2{£)  =  \\£  -  |f)(f|  ||oo 

=  ||^CG(f-|r)<f|)c4G||oo 


(12) 

(13) 


=  max 
0 

i/^trivial 


1 


7GT 


oo 


<  max 

i/^trivial 


l 


Yr^) 

7^r 


oo 


=  X2(Wr). 


(14) 

(15) 

(16) 


This  completes  the  proof  □ . 

5  Examples  of  quantum  expanders 

If  G  =  Sn  then  we  can  use  the  explicit  expander  of  [13]  and  the  efficient  QFT  of  [14].  The 
dimension  N  =  d\  can  be  the  size  of  any  irrep  of  Sn,  which  asymptotically  can  be  as  large  as 
\fn\  exp (— 0(y/n)).  Run-time  is  thus  poly-logarithmic  in  the  dimension,  meaning  polynomial 
in  the  number  of  qubits.  However  if  we  would  like  an  expander  on  exactly  N  dimensions,  we 
are  not  guaranteed  that  n  <  poly  log(]V)  exists  such  that  d\  =  N  for  some  A  €  Sn,  nor  do  we 
know  how  to  efficiently  check,  for  a  given  n,  whether  such  a  A  exists.  (For  completeness,  we 
mention  here  that  irreps  of  Sn  are  labeled  by  partitions  (Ai, . . . ,  An)  with  Ai  +  . .  .  +  A„  =  n  and 
Ai  >  . . .  >  X„  >  0.  Their  dimension  is  given  by  d\  =  n\  IIi<j(^* —  Xj  —  i+j)/  +n  —  *)!.) 

Some  other  Cayley  graph  constructions  also  carry  over.  For  example,  the  (classical)  zig¬ 
zag  product  can  be  interpreted  as  a  Cayley  graph,  where  the  group  is  an  iterated  wreath 
product[15].  Additionally,  the  irreps  of  these  wreath  products  are  large  (although  also  with 
possibly  inconvenient  dimensions)  and  quantum  Fourier  transforms  on  them  can  be  performed 
efficiently [6].  Thus,  classical  zig-zag  product  expanders  can  also  be  used  to  construct  efficient, 
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constant-degree,  constant-gap  quantum  expanders.  (We  remark  in  passing  that  this  construc¬ 
tion  appears  not  to  be  related  to  the  quantum  zig-zag  product  of  [3].) 

If  we  permit  approximate  constructions  then  we  can  relax  the  assumption  that  G  is  finite. 
For  example,  if  G  =  SXJ{ 2)  then  several  explicit  expanders  are  known[16, 17],  but  no  efficient 
circuits  are  yet  known  for  the  QFT.  It  would  suffice  even  to  be  able  to  implement  r>  (g)  in 
time  poly-logarithimic  in  d\.  This  latter  result  is  claimed  by  [18],  but  the  algorithm  there  is 
missing  crucial  steps. 

Finally,  to  construct  expanders  for  any  dimension  TV  >  1  we  can  use  the  fact  that  the 
S/v-f-i-irrep  A  =  (TV,  1)  has  dimension  TV.  To  implement  r;>,(7r)  for  it  £  Sjv+i  we  cannot  use 
the  QFT  on  Sjv+i,  since  our  run-time  needs  to  be  polylog(TV).  However,  we  can  instead 
embed  V\  into  the  TV  +  1-dimensional  defining  representation  of  Sjv+i,  which  is  given  by 
rdef(7r)|s>  =  |7r(a;))  for  x  =  1, . . . ,  TV  +  1.  This  representation  is  reducible  and  decomposes 
into  one  copy  of  trivial  representation  (spanned  by  |1)  +  . . .  +  |  TV  +  1))  and  one  copy  of  the 
TV-dimensional  irrep  Vyy/i  j .  To  embed  V\  in  the  defining  representation,  we  can  use  any 
TV  +  1-dimensional  unitary  that  maps  |TV  +  1)  to  ^+1  \x).  Then  performing  rdef(7Tj) 

(for  Cayley  graph  generator  nj)  requires  only  that  TTj(x)  be  computable  from  j  and  x  in 
time  poly  (log  TV).  A  careful  examination  of  the  construction  of  [13]  shows  this  to  be  the 
case.  Thus,  this  technique  yields  constant-degree,  constant-gap  explicit  expanders  for  any 
dimension  TV  >  1.  (Of  course,  for  low  enough  values  of  TV  the  degree  will  be  larger  than 
TV2  and  so  the  resulting  expander  will  be  inferior  to  the  trivial  “expander”  which  applies  a 
random  generalized  Pauli  matrix.) 
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We  introduce  the  concept  of  quantum  tensor  product  expanders.  These  generalize  the 
concept  of  quantum  expanders,  which  are  quantum  maps  that  are  efficient  randomizers 
and  use  only  a  small  number  of  Kraus  operators.  Quantum  tensor  product  expanders 
act  on  several  copies  of  a  given  system,  where  the  Kraus  operators  are  tensor  products  of 
the  Kraus  operator  on  a  single  system.  We  begin  with  the  classical  case,  and  show  that 
a  classical  two-copy  expander  can  be  used  to  produce  a  quantum  expander.  We  then 
discuss  the  quantum  case  and  give  applications  to  the  Solovay-Kitaev  problem.  We  give 
probabilistic  constructions  in  both  classical  and  quantum  cases,  giving  tight  bounds  on 
the  expectation  value  of  the  largest  nontrivial  eigenvalue  in  the  quantum  case. 

Keywords :  Quantum  computing,  Unitary  transform,  Wavelet 
Communicated  by:  R  Jozsa  &  J  Watrous 


1  Background:  classical  and  quantum  expanders 
1 . 1  Definitions 

The  concept  of  f-designs[l]  provides  a  way  of  randomizing  quantum  states.  For  example,  a 
1-design  is  a  set  of  unitaries  {Uk},  where  k  =  such  that  the  average  over  the  set 

takes  any  input  state  to  a  maximally  mixed  state.  A  2-design  is  a  set  of  unitaries  such  that 
applying  Uk  ®  Uk  to  a  state  on  a  bipartite  system  generates  the  twirling  operation[2].  Quan¬ 
tum  expanders,  as  studied  in  Hamiltonian  complexity [3],  computer  science[4],  and  quantum 
information  theory  [5],  provide  a  way  of  approximately  realizing  a  1-design  by  repeatedly  ap¬ 
plying  a  completely  positive  map  built  out  of  a  small  number  of  unitaries.  In  this  paper,  we 
introduce  the  concept  of  “tensor  product  expanders” ,  which  generalize  this  result  and  give  us 
a  way  to  approximately  realize  f-designs.  We  also  discuss  the  classical  case,  and  show  that 
classical  tensor  product  expanders  can  be  used  to  generate  quantum  expanders. 

Quantum  expanders  are  a  quantum  analogue  of  expander  graphs  [8].  In  the  quantum  case, 
we  consider  a  completely  positive,  trace  preserving  map 

D 

£(M)^J2A  Hs)ma(s),  (1) 

S=  1 
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where  the  number  of  Kraus  operators  D  is  relatively  small  and  the  map  E  has  a  spectral 
gap  between  the  largest  eigenvalue,  equal  to  unity,  and  the  next  largest  eigenvalue?  We  write 
the  spectrum  of  E  as  Aj,  A2, ...  with  Ai  =  1  and  A2, ...  all  bounded  in  absolute  value  by  some 
A  <  1.  We  can  equivalently  consider  the  operator  £  :=  -A(s)  ®  4l(s)*. 

In  this  paper  we  consider  the  case  in  which  the  operators  A 1  (s)  are  proportional  to  unitary 
operators: 

A(s)  =  -j=U{s).  (2) 

Then  the  expander  map  can  be  implemented  by  choosing  s  uniformly  at  random  from  {1, . . . ,  D}, 
and  then  applying  U(s)  to  the  quantum  state.  The  natural  generalization  of  this  process,  in 
which  we  consider  k  copies  of  a  quantum  system,  choose  a  unitary  at  random,  and  apply  the 
unitary  to  all  k  copies,  will  be  called  a  fc-copy  tensor  product  expander.  We  will  show  that 
these  give  a  way  to  approximate  f-designs  for  t  =  k. 

Random  walks  on  expander  graphs  can  be  viewed  similarly,  as  acting  on  a  distribution 
with  a  randomly  chosen  permutation  matrix.  Consider  a  directed  graph,  where  each  node  has 
D  edges  leaving  it.  Label  the  edges  from  1  up  to  D  such  that  each  label  appears  exactly  once 
among  the  incoming  edges  of  each  vertex  and  exactly  once  among  the  outgoing  edges  of  each 
vertex.  Then,  for  each  edge  label  s,  1  <  s  <  D,  define  a  permutation  7 rs,  where  7rs(i)  =  j  if 
a  directed  edge  with  label  s  goes  from  node  i  to  node  j.  Then,  given  a  random  walk  on  the 
graph,  the  probability  distribution  p(i)  changes  in  a  single  step  by 

D  N 

p(i)  d'TjYI pAijPti)’  (3) 

S=1  j=l 

where  P(s)  is  the  permutation  matrix  corresponding  to  the  permutation  7rs;  i.e.  P(s)ij  =  1 
if  7 Ts(j)  =  i  and  0  otherwise. 

Hermitian  expanders:  It  is  sometimes  convenient  to  guarantee  that  an  expander  we  con¬ 
struct  is  Hermitian.  To  obtain  Hermitian  £  in  the  quantum  case,  we  impose 

U(s  +  D/2)  =  U(Sy.  (4) 

Similarly,  in  the  classical  case,  we  impose 

77  S  —  ns+D/2  (3) 

This  turns  the  directed  graph  into  an  undirected  graph.  For  notational  convenience,  we 

identify  s  +  D  with  s  throughout  this  paper,  so  that  s  is  a  periodic  variable  with  period  D. 
Note  that  this  constraint  (4)  requires  that  D  be  even.  There  do  exist  other  ways  to  construct 
Hermitian  expanders  with  odd  D,  if  for  some  s  we  have  U(s)  =  U(s )t. 

1.2  Application  to  state  randomization 

For  classical  expanders,  an  important  implication  of  the  spectral  gap  is  that  random  walks 
on  an  expander  graph  rapidly  approach  the  stationary  distribution.  Similarly,  quantum  ex¬ 
panders  can  be  shown  to  be  rapid  mixing.  This  has  application  to  the  problem  of  state 


aIn  the  non-Hermitian  case  discussed  below,  we  define  the  gap  instead  to  be  one  minus  the  second- largest 
singular  value  of  the  map  E. 
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randomization,  in  which  classical  randomness  is  used  to  map  a  quantum  state  to  an  output 
that  is  close  in  trace  distance  to  the  maximally  mixed  state.  Ideally  the  constructions  would 
be  [computationally]  efficient,  meaning  they  run  in  time  polynomial  in  the  number  of  qubits, 
and  would  use  as  few  random  bits  as  possible. 

To  make  this  concrete,  suppose  that  £  is  Hermitian  and  unital  with  gap  1  —  A,  and  consider 
a  quantum  state  p.  We  wish  to  bound  the  trace  norm  distance  between  the  maximally  mixed 
state  and  the  state  Em{p)  obtained  by  acting  on  p  with  some  high  power,  m,  of  the  map  E. 
The  calculation  exactly  follows  the  classical  case.  We  begin  by  bounding  the  I2  distance.  For 
a  matrix  A,  define  ||j4||2  =  i/trAtA  and  ||A||i  =  tr  \A\  =  tr  V  A^  A.  Then 

<  |A|2m,  (6) 

2 

as  may  be  shown  by  writing  p  as  a  linear  combination  of  eigenvectors  of  E,  and  then  by 
Cauchy-Schwartz, 

<  VN\\\™.  (7) 

Thus,  to  obtain  a  given  bound  on  the  trace  norm  distance  e,  it  suffices  to  take 

m  >  logA(e/v^V).  (8) 


This  implies  that  the  set  of  unitaries,  consisting  of  all  unitaries  of  the  form  U(s\)U(s2)  •  •  •  U(sm), 
gives  an  e-approximate  1-design  using 


K  :=  Dm 


/  n\  3  '“Bi/aI0) 

w) 


(9) 


unitaries. 

The  exponent  \  log(D)/ log(l/A)  can  be  thought  of  as  a  measure  of  the  efficiency  of  an 
expander,  meaning  the  number  of  bits  of  randomness  it  requires  to  achieve  a  certain  amount 
of  state  randomization.  Before  showing  how  to  evaluate  |  log(D)/ log(l/A),  we  review  other 
methods  of  l\  state  randomization.  The  simplest  is  to  apply  one  of  N2  generalized  Pauli 
operators.  This  can  be  done  efficiently  (i.e.  in  time  polylog(N))  and  perfectly  randomizes  any 
state  (i.e.  e  =  0).  However,  it  uses  far  more  randomness  than  necessary  when  e  >  0.  Choosing 
K  =  0(Ne~2  log(l/e))  random  unitaries  was  shown  to  suffice  in  [10],  improving  a  result  of 
[11]  (both  of  which  in  fact  addressed  the  more  difficult  problem  of  state  randomization). 
Similarly  an  efficient  K  =  4IVe-2  construction  was  given  in  [12],  which  uses  less  randomness 
than  the  efficient  constructions  of  [13]  and  even  than  the  inefficient  constructions  based  on 
random  unitaries.  We  note  in  passing  that  the  constructions  in  [12,  13]  are  based  on  expanders 
with  A  =  e/y/N  and  D  =  K. 

An  expander-based  state  randomization  scheme  will  be  efficient  if  the  underlying  ex¬ 
pander  is  efficient  and  the  number  of  unitaries  it  uses  will  be  given  by  (9).  Unfortunately 
|  log(D) /  log(l/A)  is  larger  than  2  for  all  known  efficient  constant-degree  expander  constructions [5, 
6,  7]  (e.g.  for  the  Margulis  expander[6],  it  is  «  8.4,  and  for  the  zig-zag  product[5]  it  is  2+o(l)). 
However,  if  U{  1), . . . ,  U(D/2)  are  chosen  at  random  with  U(s  +  D/2)  =  U(s)t  then  Ref.  [19] 
showed  that  with  high  probability  \  log(D)/ log(l/A)  «  l  +  C>(log(Af)Ar_1/6)-|-2/log(Il),  and 
thus  that  K  is  within  a  small  multiplicative  factor  of  N/e2. 

We  summarize  the  above  discussion  as  follows: 
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Theorem  1  For  any  N  and  any  e  >  0,  consider  a  set  of  unitaries  U\, . . .  ,Uk  £  Un, 
which  are  taken  to  he  strings  of  unitaries  drawn  from  a  set  of  D/2  unitaries  1/(1),  ...,U(D /2) 
and  their  conjugates  for  any  D  >  4.  Then  for  most  choices  of  U(  1), U(D  / 2),  choose  the 
string  length  such  that 

/N\  1 +0(iV-1/6  log(JV))+2/  log(D) 

K=(^)  (10) 


and 


1 

K 


Y.u°pus 


<  e, 


for  all  N -dimensional  density  matrices  p. 

If  we  take  D  «  AN /e2  then  Theorem  1  can  be  thought  of  as  tightening  the  analysis 
of  random  unitaries  from  [10,  11,  12],  so  that  only  (4  +  o(l))N/e2  random  unitaries  are 
necessary.  This  shows  that  Haar-uniform  unitaries  require  almost  exactly  the  same  amount 
of  randomness  as  the  construction  of  [12],  although  they  have  the  substantial  disadvantage  of 
requiring  poly(N)  time  to  implement  instead  of  poly(\og(N))  time.  Since  A  >  (2 \JD  —  1/D  — 
0(1 /N)) •  (1  —  0(log  log(lV) /  log(/V)))  for  any  quantum  expander  that  includes  its  own  inverses 
[19],  one  can  show  that  41V/e2  is  the  minimum  possible  values  of  K  for  any  expander-based 
randomizing  map. 

Apart  from  random  unitaries  and  the  large- D  constructions  of  [13,  12],  we  know  of  one 
other  class  of  quantum  expanders  for  which  1  log(D)/log(l/A)  «  1.  These  are  obtained  by 
applying  the  prescription  of  [7]  to  the  SU( 2)  expanders  described  by  Lubotsky,  Phillips  and 
Sarnak  in  [14].  Such  expanders  exist  for  any  N  whenever  D  is  odd  and  2D  —  1  is  prime,  and 
satisfy  A  =  2\JD  —  1  exactly.  Thus,  they  provide  another  K  ~  4 N/e2  method  of  performing 
state  randomization.  However,  the  only  claimed  efficient  construction  of  these  expanders[15] 
has  an  incomplete  proof. 

In  the  non-Hermitian  case,  (6)  holds  when  A  is  the  second-largest  singular  value  of  an 
expander.  If  t/(l), . . . ,  U (D)  are  chosen  uniformly  at  random,  then  [19]  proved  that  with  high 
probability  the  singular  values  of  Sm  for  m  =  0(N 1/6)  are  bounded  by  N2(  1/ y/D)m(l+o(l)). 
This  implies  that  the  second-largest  eigenvalue  of  £  is  <  ^=(1  +  0(log(Ar)Ar_1/6)),  but  does 
not  yield  meaningful  bounds  on  the  second- largest  singular  value  off.  Indeed,  Tobias  Osborne 
has  pointed  out  that  when  m  =  1  and  D  =  2,  the  second  largest  singular  value  is  equal  to 
unity.  If  £m  turned  out  to  have  singular  values  nearly  equal  to  D~m'2  then  it  would  imply 
that  ~  N/e2  random  unitaries  sufficed  to  e-randomize  a  state. 

We  now  turn  to  tensor  product  expanders,  considering  classical  tensor  product  expanders 
in  Section  2  and  quantum  tensor  product  expanders  in  Section  3.  The  mixing  analysis  above 
generalizes  in  the  tensor  product  case  to  give  approximate  /-designs.  We  will  describe  ran¬ 
domized  constructions  of  both  classical  and  quantum  tensor  product  expanders.  Our  basic 
tool  to  prove  that  a  random  construction  gives  an  expander  with  high  probability  is  the  trace 
method  (see,  for  example  [8,  18]).  The  basic  idea  of  the  trace  method  is  to  bound  eigenvalues 
of  some  linear  operator  by  bounding  the  trace  of  high  powers  of  that  operator.  For  example, 
for  a  positive  definite  Hermitian  operator  whose  two  largest  eigenvalues  are  equal  to  unity 
and  to  A,  the  trace  of  the  mth  power  is  at  least  equal  to  1  +  Am,  so  by  bounding  the  trace 
we  bound  A.  We  focus  on  high  powers  of  the  operator  so  that  the  trace  will  be  dominated 
by  the  largest  eigenvalues.  The  trace  method  will  be  adapted,  with  slight  modifications,  to 
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the  various  cases,  depending  on  whether  classical  or  classical  and  quantum,  and  depending 
on  whether  we  consider  an  expander  and  or  a  tensor  product  expander. 

2  Classical  Tensor  Product  Expanders 

In  this  section  we  define  classical  tensor  product  expanders,  and  give  a  random  construction 
of  them.  We  then  show  an  application  of  them  to  constructing  quantum  expanders. 

2.1  Preliminaries,  Definitions  and  Applications 

We  define  an  (TV,  D ,  A,  k )  classical  fc-copy  tensor  product  expander  to  be  a  set  of  N-by-N 
permutation  matrices  P(s),  1  <  s  <  D,  with  the  property  that  the  matrix  L,  defined  by 

^  =  (u) 

S=1 

has  some  number,  fk,  eigenvalues  equal  to  unity,  with  f£r  defined  below,  and  then  all  other 
eigenvalues  less  than  or  equal  to  A  in  absolute  value.  (Again,  if  L k  is  non-Hermitian  then  we 
consider  its  singular  values.) 

We  can  obtain  Hermitian  operators  L k  by  considering  D  even,  and  imposing  P(s  +  D/ 2)  = 
P(s)t.  To  obtain  Hermitian  Lk  for  D  odd,  we  can  instead  impose  P(s)  =  P(s)t;  that  is,  the 
permutation  matrices  correspond  to  perfect  matchings.  Both  models  corresponds  to  models 
of  random  graphs  for  k  =  1  discussed  in  [9]. 

These  expanders  can  also  be  defined  by  graphs  with  Nk  nodes,  labelled  (771,712, . .  •  ,71*,), 
where  1  <  m  <  N.  There  is  an  edge  from  one  node  (m, . . . ,  n*,)  to  another  node  (n'i, . . . ,  n'k ) 
if  and  only  if  one  of  the  given  permutations  sends  n\  — >  n'i, . . .  ,n*,  — >  n'k.  We  refer  to  this 
graph  as  G*,.  Alternatively,  we  can  regard  rij, ...,  n*,  as  fc  different  random  walkers  executing 
a  correlated  random  walk  on  the  original  graph. 

The  function  fk  is  defined  to  be  equal  to  the  number  of  unit  eigenvalues  of  the  operator 

m  E  p®k  (12) 

tt£Sn 

where  the  sum  ranges  over  all  permutations  7r,  and  Pn  is  the  permutation  matrix  corresponding 
to  permutation  7r.  Since  this  operator  performs  an  average  over  a  group  action,  it  is  a 
projector.  Applying  it  to  a  computational  basis  state  |t7i, . . . ,  n*,)  maps  it  to  the  superposition 
of  all  | n'1: . . .  ,n'k)  such  that  n.'  =  nf  iff  m  =  nj.  Thus  we  can  represent  eigenstates  by 
partitions  of  {1, . . . ,  k}  into  <  N  blocks,  such  that  indices  are  equal  within  blocks  and  unequal 
across  blocks.  For  example,  fi  =  1,  =  2  (corresponding  to  the  sum  of  all  states  with 

Tii  =  7^2  and  the  sum  of  all  states  with  n±  ^  71.2),  =  5  (corresponding  to  the  possibilities 

771  =  772  =  773,  77 1  =  772  ^  773  ,  771  =  773  7^  772,  772  =  773  ^  771,  and  ?7l  ^  772  #  773  7^  77l),  and  SO 
on.  Note  that  if  N  >  k  then  the  constraint  that  there  be  <  A"  blocks  becomes  superfluous, 
and  fk  becomes  simply  the  kth  Bell  number  Bk ,  which  counts  the  total  number  of  ways  of 
partitioning  a  fc-element  set. 

Any  matrix  L*  of  the  form  (11)  is  block  diagonal  with  fk  different  blocks  depending 
on  the  symmetry  of  the  elements  77i,...,n*,  under  permutation;  we  call  these  subspaces 
S\,  S2,  ■  ■  ■ ,  SfN .  By  the  arguments  of  the  above  paragraph,  we  can  write  the  projector  in 
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(12)  as 

fk 

\u^(uo.\ , 

a=  1 

for  some  unit  vectors  |wa)  6  Sa ■  These  | ua)  are  unit  eigenvalues  not  only  of  (12)  but  also  any 
Lk- 

Rapid  mixing:  Given  the  spectral  gap,  repeatedly  applying  a  classical  tensor  product 
expander  many  times  (of  order  fclog(A))  generates  an  approximately  fc-wise  independent 
permutation.  This  means  that  the  results  of  applying  it  to  fc  distinct  elements  are  almost 
indistinguishable  from  applying  a  single  permutation  to  each  of  the  k  elements.  More  precisely, 
given  an  initial  probability  distribution,  p ,  in  any  of  the  different  subspaces  Sa ,  we  have 

||T^MQ||i<^fc|Ar,  (13) 

where  ua  is  the  l\  normalized  eigenvector  with  eigenvalue  unity  in  this  subspace.  This  ap¬ 
proach  towards  generating  fc-wise  independent  permutations  has  also  been  considered  in  [16]. 

Expanders  are  not  always  tensor  product  expanders.  The  requirement  that  a  set  of  per¬ 
mutations  form  a  tensor  product  expander  for  k  >  1  copies  is  more  stringent  than  the  re¬ 
quirement  for  k  =  1  copy,  as  it  implies  that  the  correlations  between  elements  are  destroyed 
by  the  expander.  For  an  example  of  a  classical  expander  that  does  not  give  a  tensor  product 
expander,  consider  any  set  of  D  permutation  matrices,  P(s ),  on  A  elements  that  gives  a 
classical  expander.  Define  a  new  set  of  permutation  matrices,  P'(s),  on  2  A  elements,  such 
that  P'(s)  =  P(s)  ®  P{s)  for  s  =  1, . . . ,  D.  Finally,  define  the  permutation  P'(D  +  1)  which 
sends  i  to  i  +  A  if  i  <  A,  and  sends  i  to  i  —  A  if  i  >  A.  Then,  these  D  +  1  different 
permutation  matrices  define  a  k  =  1  expander  (they  simply  correspond  to  two  copies  of  the 
original  graph,  with  the  possibility  of  moving  between  the  two  copies  by  using  permutation 
matrix  P'(D  +  1)),  but  does  not  define  a  k  =  2  expander:  if  two  walkers,  rij,  712  originally  are 
in  the  same  copy  as  each  other,  then  they  remain  in  the  same  copy. 

Another  example  comes  from  Cayley  graphs.  If  G  is  a  group  with  generators  g  1, . . .  ,gD 
then  the  Cayley  graph  on  G  is  defined  by  taking  A  =  |G|  and  P(s)\g)  =  | gsg)  for  s  = 
1, . . . ,  D.  There  are  many  Cayley  graph  expanders  known  (c.f.  Section  11  of  [8]),  but  applying 
P(s)  ®  P{s)  to  any  | g)  ®>  | h)  produces  a  new  state  \g)  ®  |h)  with  5-1/t  =  g~1h.  Thus,  no 
Cayley  graph  expander  can  be  a  tensor  product  expander  unless  it  is  modified  in  some  way. 

The  limit  of  large  k:  Observe  that  any  fc-copy  tensor  product  expander  is  also  a  fc'-copy 
tensor  product  expander  for  all  k'  <  k.  On  the  other  hand,  even  if  k  >  TV  then  the  k 
walkers  can  still  occupy  only  at  most  A  positions.  Thus  if  a  map  is  au  A-copy  tensor  product 
expander  than  it  is  also  a  fc-copy  tensor  product  expander  for  all  fc. 

An  equivalent  condition  to  {7Ti, . . . ,  7 td}  C  Sn  being  an  A-tensor  product  expander  is  that 
the  Cayley  graph  generated  by  {771, . . .  ,7 td}  is  an  expander.  The  spectrum  of  this  Cayley 
graph  is  identical  (up  to  multiplicity)  to  that  of  Lk  for  all  fc  >  A  (with  P(s)  defined  to  be 
PnJ- 


2.2  Random  permutations  are  tensor  product  expanders 

The  question  then  naturally  arises  whether  fc  >  1  tensor  product  expanders  actually  exist. 
Of  course  there  is  a  trivial  D  =  A!  construction  where  we  take  {7Ti , . . . ,  7 r/v}  =  Sn  and 
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achieve  A  =  0  for  all  k.  We  would  prefer,  though,  that  D  =  0(1).  The  construction  of  [16] 
nearly  achieves  this  with  D  =  polylog(N)  and  A  =  1  —  l  /  poly  (k, log  N).  For  a  constant 
degree  construction,  we  can  use  Kassabov’s  expander[17]  on  <Sjv-  This  achieves  D  =  0(1) 
and  A  equal  to  a  constant  strictly  smaller  than  1  for  all  N  and  k.  Additionally,  it  can  be 
implemented  in  time  polylog(N). 

In  this  section,  we  give  a  randomized  construction  of  tensor  product  expanders  for  any 
1 

even  D  >  4  and  with  A  «  A j)+1 ,  where 


A  h  '■= 


2-/D^I 

D 


(14) 


Theorem  2  Choose  7Ti,  . . .  ,ttd/2  G  Sn  at  random  and  then  take  ns+£)/2  =  7rs  1.  Let 
P{s)  =  P-Kg  •  For  any  k ,  let  A  denote  the  f/f  +  1st  largest  eigenvalue  of  L^.  Then  for  any 

P,-[a  >  c(A *  +  0(‘°g(t)  ) ')ff0g(jV))))]  <  (15) 

where  Pr [.. .]  denotes  probability  and  A h  depends  on  D  as  given  in  Eq.  (14). 

Note  that  since  A^fc+1)  converges  to  unity  as  k  becomes  large,  the  result  (15)  is  only 
meaningful  for  k  =  0(log(lV)/ log(log(lV))).  Constants  depending  on  D  are  also  hidden 
inside  of  the  0{...)  notation.  The  result  is  likely  far  from  optimal,  since  numerical  studies 
indicate  that  for  fixed  k  and  large  N,  the  largest  non-trivial  eigenvalue  A  approaches  A h-  This 
result  for  the  case  k  =  1  was  only  recently  proven  [9].  Our  proof,  which  gives  a  weaker  bound 
on  the  expectation  value  of  A  roughly  follows  the  presentation  of  the  trace  method  in  [8,  18], 
with  some  modifications. 

Proof  of  Theorem  2:  We  will  apply  the  trace  method  separately  in  each  of  the  subspaces 
Sa-  It  suffices  to  consider  only  one  such  subspace  Sa,  the  subspace  S,n  in  which  all  of  the 
ni,  ri2, . . . ,  rifc  differ  from  each  other,  since  every  eigenvalue  of  L*,  is  an  eigenvalue  of  L ^  re¬ 
stricted  to  SfN.  For  example,  consider  the  case  k  =  2.  We  have  two  different  subspaces, 

•>  k 

one  with  n\  =  n-2  and  one  with  n\  ^  n2-  The  eigenvectors  of  the  first  subspace,  of  the  form 
JT  p(i)\i)\i),  correspond  to  eigenvectors  of  Li  of  the  form  JTp(*)|i).  Given  such  an  eigenvec¬ 
tor,  we  can  construct  an  eigenvector  in  the  second  subspace  equal  to  JT  XO;^iP(*)K) |j)  with 
the  same  eigenvalue,  as  claimed. 

Let  E[..]  denote  an  average  over  different  choices  of  permutation  matrices.  Then  for  any 
even  m, 

£[|A|]  <  (E[ti(L™ I?)]  -  l)1/-,  (16) 

where  R  is  the  projector  onto  the  given  subspace.  The  expectation  value  R)]  equals 


1  m  D  D  D 

{o)m  EE-E  E[tr(P(s1)P(s2)...P(sm)R)}. 

Si  =  1  S  2  =  1  Sm  =  l 


(17) 


If  for  some  i  we  have  Si  =  s,;+i  +  D/2,  then  P(s;)P(s;+i)  =  I,  and  we  can  remove  that  pair 
of  permutation  matrices  from  the  trace  above.  Similarly,  if  sm  =  Si  +  D/2,  then  we  can 
remove  the  first  and  last  permutation  matrices  from  the  trace,  exploiting  the  cyclic  invariance 
of  the  trace  and  the  vanishing  commutator  [P(s),i?]  =  0.  We  can  consider  these  operations 
as  acting  on  a  word  Si,S2,...,sm  on  an  alphabet  {1,  We  define  a  reduced  word  by 
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removing  pairs  of  letters  of  the  form  s,s  +  D/2.  Similarly,  if  the  word  ends  with  a  letter  s 
and  begins  with  a  letter  s  +  D/ 2,  we  remove  this  pair  also.  We  repeat  these  removals  until  no 
further  removals  are  possible.  The  result  is  a  reduced  word  of  length  m°  <  m;  the  resulting 
sequence  we  write  Sj,  s2, ...,  s'm0.  There  are  at  most 

(I)  -  i)™/22™  -  (18) 

choices  of  si, ...,  sm  which  give  m°  =  0;  the  number  of  these  choices  is  equal  to  Dm  times  the 
return  probability  of  a  random  walk  of  length  m  on  a  Cayley  tree  of  degree  D.  For  these 
choices,  we  have  i?[tr(P(si)P(s2)...T>(sm)i?.)]  =  tr (R)  <  Nk. 

We  now  consider  the  other  choices  of  si, ...,  sm,  where  m°  >  0.  In  general, 

EMP(S[)P(S'2)...P(S'm0)R)}  <  NkE[tr(P(S[)P(S'2)...P(slmo)R1,2,...,k)},  (19) 

where  Riy2,...,k  projects  onto  the  state  with  ni  =  l,ri2  =  2 ,...,nfc  =  fc.  To  compute  this 
expectation  value,  we  define  Vq  =  a,  for  1  <  a  <  k.  Then,  define  v'/ ,  for  i  >  1  and  1  <  a  <  fc, 
to  be  7rs/(n“_1).  Then,  the  probability  that  va  0  =  a  for  all  a  is  equal  to  the  desired  result. 
We  compute  this  probability  as  follows.  Consider  this  as  happening  sequentially,  where  first 
we  define  uj*  for  all  a ,  then  we  define  v2 ,  and  so  on.  We  say  that  a  choice  of  v/  is  “free”  if  at 
no  previous  step  j  <  i  did  we  compute  7rs' (v^-l)  with  s'  =  s'  and  v^_1  =  vf_t.  If  a  choice 
of  is  free,  and  if  t  values  of  7 rs<  have  been  previously  revealed,  than  we  can  simply  pick 
vf  at  random  from  the  N  —  t  possibilities,  thus  revealing  some  of  the  information  about  the 
permutation  7rs£> ,  and  increasing  t  by  one  for  that  permutation.  If  a  choice  is  not  free,  then  it 
is  “forced”,  in  which  case  we  have  no  choice  about  the  value  of  7rs'(u“_1). 

We  say  that  a  coincidence  occurs  at  step  i  for  walker  a  if  this  is  a  free  step  and  the 
randomly  selected  vertex  coincides  with  a  previously  selected  vertex  (previously  selected  by 
any  of  the  walkers).  Note  that  for  u“i0  to  equal  a  for  all  a,  we  must  have  at  least  k  coincidences. 
There  are  two  cases:  either  there  are  at  least  k  +  1  coincidences,  or  else  there  are  exactly  k 
coincidences. 

The  probability  of  there  being  at  least  k  +  1  coincidences  can  be  computed  as  follows.  Let 
*i,  *2, ...,  ik+i  be  the  steps  of  the  first  fc+1  coincidences  and  ai,  <12, ...,  a/c+i  be  the  corresponding 
walkers.  The  probability  of  having  these  coincidences  for  given  i  1, ...  and  ai, ...  is  bounded  by 
(mk / (N  —  mk))k+1 .  Summing  over  all  possible  steps  and  walkers,  we  find  that  the  probability 
of  having  at  least  fc+1  coincidences  is  bounded  by 

mk+1  kk+1  (mk  /  (N  —  mk))k+1.  (20) 

If  there  are  exactly  fc  coincidences,  then  each  walker  has  exactly  one  coincidence  given 
that  =  a  for  all  a.  There  are  two  possibilities:  either  all  of  the  coincidences  occur  on  the 
last  step,  or  at  least  one  coincidence  does  not  occur  on  the  last  step.  The  probability  of  the 
first  case  is  at  most  (l/(Ar  —  mk))k.  If  at  least  one  coincidence  does  not  occur  on  the  last  step, 
then  let  walker  b  be  the  first  walker  to  have  a  coincidence,  occurring  on  step  j.  Note  that 
each  of  the  vertices  1, ...,  a  must  be  the  randomly  selected  vertex  on  exactly  one  coincidence, 
again  given  that  v =  a  for  all  a.  Because  there  are  no  further  coincidences  for  walker  b,  we 
have  s(  =  s'+J  for  all  i.  The  fraction  of  reduced  words  of  length  mo  that  obey  this  constraint 
for  given  j  <  mo/2  is  at  most  ( D  —  l)-mo/2  'j’jjg  fraction  of  words  that  have  a  reduced 
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word  of  length  m0  is  at  most  ( D  —  l)m°/2A^.  Therefore,  the  fraction  of  words  that  have  a 
reduced  word  obeying  this  constraint,  after  summing  over  j,  is  at  most  mAf).  The  probability 
of  having  these  coincidences  is  bounded  by  (m/ ( N  —  mk))k ,  where  the  factor  of  m  arises  from 
the  choice  of  step  on  which  the  coincidence  occurs  (this  is  in  fact  a  large  overestimate).  The 
product  of  these  probabilities  is  mA^(m/(N  —  mk))k.  The  total  of  these  two  possibilities  is 

( 1/ (JV  —  mk))k  +  ( m/(N  —  mk))kmA/)).  (21) 

Adding  the  sum  of  the  expectation  value  over  words  with  m°  =  0  (which  is  bounded  by 
NkArf)  by  Eq.  (18)  to  Nk  times  the  sum  of  (20,21),  we  find  that 

E[tr(P(s'1)P(s'2)...P(s'mo)R)]  <  NkAff  +  Nkmk+1kk+1(mk/(N  -  mk))k+1 

+(N/(N  -  m,k))k  +  ( Nm/(N  -  mk))kmXg.  (22) 

and  therefore 

EMP(s'1)P(s,2)...P(s'm0)R)]  -  1 
<  Nk AS  +  Nkmk+1kk+1(mk/{N  -  mk))k+1 

+[(N/(N  -  mk))k  -  1]  +  ( Nm/(N  -  mk))km\ %  (23) 

=  NkX%  +  Nkmk+1kk+1(mk/{N  -  mk))k+1 
+0(mk2  /N)  +  (Nm/(N  —  mk))kmA'J^. 

We  pick 

m=  (k  +  l)\og1/Xll(N)  (24) 

to  minimize  this  expectation  value,  finding 

(E[tr(P(S'1)P(S'2)...P(S'rno)R)}  -  l)1/”1  <  A ^k+1\0(m.k))^m.  (25) 

Applying  Markov’s  inequality  then  yields  the  proof  of  the  Theorem.  | 

2.3  Quantum  expanders  from  classical  tensor  product  expanders 

One  application  of  k  =  2  classical  tensor  product  expanders  is  to  constructing  quantum 

expanders.  We  give  two  constructions. 

The  first  approach  was  introduced,  but  not  formally  analyzed,  in  [3].  Let  P(s)  be  a  set 
of  random  permutation  matrices  defining  a  k  =  2  tensor  product  expander,  as  in  the  random 
construction  of  a  k  =  2  tensor  product  expander  above.  Then,  define  c(s),  for  s  =  1...D,  to 
be  a  diagonal  matrix.  For  s  =  1.....D/2  we  choose  er(s)  to  have  diagonal  entries  ±1  chosen 
independently  at  random  and  we  choose  o(s  +  D/2)  =  P(s)o(s)P(s)^ .  Then,  in  [3]  it  was 
shown  numerically  that  the  A  matrices, 

A(s)  =  ~^=P(s)a{s),  (26) 

define  a  quantum  expander  with  high  probability.  Note  that  the  choice  of  a(s  +  D/2)  is  such 
that  A(s  +  D/2)  =  A(s)t  =  (1  /s/D)a(s)P(s)^  so  that  this  is  a  Hermitian  expander  because 
P(s)  =  P(s  +  D/2y .  Numerically,  A  was  observed  to  approach  Ah  for  large  N.  We  now  prove 
that  we  do  indeed  get  a  quantum  expander  with  high  probability,  but  with  a  weaker  bound 
on  Ah- 
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Theorem  3  Choose  tti,  . . .  ,ttd/2  €  Sn  at  random  and  then  take  Tts+D/2  =  7r71-  Let 
P{s)  =  Pns-  Choose  cr(s)  as  described  above.  Let  A  denote  the  second  largest  eigenvalue  of 
the  map  with  Kraus  operators  given  by  the  matrices  A(s)  in  Eq.  (26).  Then,  for  any  c  >  1, 

Pr[A  >  c(a|  +  °(1°g1og(g^r^))]  <  c-31os‘Ah(jv).  (27) 

The  Hermitian,  completely  positive  map  £  defined  by  the  A  matrices  in  (26)  sends  a 
diagonal  matrix  to  a  diagonal  matrix  and  an  off-diagonal  matrix  to  an  off-diagonal  matrix. 
So,  we  consider  the  spectrum  of  £  in  the  diagonal  and  off-diagonal  sectors  separately.  In  the 
diagonal  sector,  the  spectrum  of  £  is  the  same  as  that  of  the  k  =  1  expander  defined  by  the 
given  permutation  matrices,  and  hence  has  a  gap  between  the  largest  eigenvalue,  equal  to 
unity,  and  the  next  largest  eigenvalue. 

The  off-diagonal  sector  requires  a  little  more  work.  We  again  use  the  trace  method.  Let 
A  be  the  largest  eigenvalue  in  absolute  value  in  the  off-diagonal  sector.  Let  M(i,j)  be  an 
N-by-N  dimensional  matrix  with  a  one  in  the  ith  row  and  jth  column,  and  zeroes  everywhere 
else,  so  that  these  form  a  basis  for  the  space  of  N-by-N  matrices.  The  M(i,j )  with  i  ^  j 
form  a  basis  for  the  space  of  off-diagonal  matrices.  Define  (M,  N)  to  be  an  inner  product  on 
the  space  of  Nk-by-Nk  dimensional  matrices  by  (M,  N)  =  tr(M'N).  Then  for  any  even  m, 

£[|A|]  <  (£E(M(i,i),r(M(i,j)))l)1/m  (28) 

i¥=i 

Note  that  compared  to  Eq.  (16),  a  factor  of  unity  is  not  subtracted  from  the  expectation  value 
on  the  right-hand  side  of  Eq.  (28). 

The  evaluation  of  the  right-hand  side  of  Eq.  (28)  proceeds  analogously  to  that  of  Eq.  (16). 
The  computation  in  the  case  m°  =  0  is  identical.  In  the  case  m°  >  0,  we  again  define 
coincidences  and  paths.  The  only  difference  is  that  now  rather  than  just  computing  the 
probability  that  i>(^0  =  a  for  all  a  =  1,2,  the  paths  come  in  with  signs  which  may  be  plus  or 
minus  one.  This  can  only  reduce  the  contribution  of  the  terms  with  m°  >  0.  We  bound  the 
case  with  k  +  1  coincidences  as  before.  We  also  bound  the  case  with  k  coincidences  not  all 
occurring  on  the  last  step  as  before.  The  only  difference  is  the  case  in  which  all  coincidences 
happen  on  the  last  step  i  =  m°.  The  probability  of  this  happening  is  (1  /IV)2.  The  sign, 
however,  is  completely  random;  it  is  equally  likely  to  be  plus  or  minus  one.  Thus,  the  paths 
with  exactly  k  coincidences,  all  occurring  on  step  i  =  m°,  contribute  zero  to  the  expectation 
value  (28).  Thus, 

£[|A|m]  <  ( Nk  +  m) Ag  +  Nkmk+1  kk+1  (mk/ (N  -  mk))k+1 .  (29) 

Picking  m  as  before,  we  find  that  E[|A|]  <  A^3(l  +  0(log(log(.ZV))/log(-/V)).  Applying 
Markov’s  inequality  yields  the  theorem.  | 

We  now  describe  our  second  construction  of  a  quantum  expander  from  a  classical  tensor 
product  expander. 

Theorem  4  Suppose  {P(l), . . . ,  P(D)}  form  a  (N,  D,  1  —  c,2)  classical  tensor  product 
expander  (i.e.  k  =  2).  Assume  that  N  >2.  Let 

N 

Z«'Em\eS*t 

3=  1 
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and  p  =  1/(1  +  e).  Define  a  quantum  operation  8(M)  with  D  +  1  Kraus  operators 
-\/y)P(  1),  ■  •  ■ ,  y/JjPiD),  y/1  —  pZ.  Then  8  is  a  (N,  D  +  1,1—  quantum  expander. 

Thus,  any  constant-gap  classical  2-TPE  can  be  used  to  construct  a  constant-gap  quantum 
expander.  No  attempt  has  been  made  to  optimize  the  constant  48,  which  we  believe  can  be 
made  arbitrarily  close  to  one  when  N  is  large  and  e  is  close  to  1. 

Note  that  y^P(l), . . . ,  P(D ),  ^/l  —  pZ  is  not  in  general  Hermitian,  but  if 

{P(l), . . . ,  P{D)}  is  Hermitian  then  {^P(  1), . . . ,  y/%P(D),  yfi^Z,  yfC^Z^}  is  a  Her¬ 
mitian  (TV,  D  +  2, 1  —  e/48)  expander;  this  is  proved  by  using  the  triangle  inequality  to  relate 
its  gap  to  the  gap  of  the  expander  in  Theorem  4. 

Proof  of  Theorem  f:  The  idea  is  that  the  classical  TPE  randomizes  the  diagonal  elements 
of  the  density  matrix  simply  because  it  is  an  expander,  and  it  randomizes  the  off-diagonal 
elements  because  it  is  a  k  =  2  TPE.  Next  the  phase  operation  Z  adds  a  phase  to  the  off- 
diagonal  elements  so  that  they  are  no  longer  fixed  by  the  classical  TPE.  Thus  the  only  fixed 
state  will  be  the  identity  matrix. 

More  formally,  let  =  ^E!=il*)N)  and  \<p2)  =  ^=y  Ei#  H)U>-  These  two 

states  form  an  orthonormal  basis  for  the  invariant  subspace  of  E^Li  -P(s)  ®  P(s).  Thus 
the  fact  that  P(  1), . . . ,  P(D)  form  a  2-TPE  implies  the  bound 


1  .  ' 

I}  P(s)  0  P(s)  -  Ti  ~  Ti 


<  A. 


Next,  a  short  calculation  shows  that  (^{Z^f)  —  —  1  /(-TV  —  1).  Now  apply  the  following 
Lemma  to  the  subspace  orthogonal  to  \<p\). 

Lemma  1  Let  H  be  a  projector  and  let  X  and  Y  be  operators  such  that  ||X]j  <  1,  ||y||  <  1, 
n.Y  =  XU  =  n,  ||(7  —  tt)X(I  —  H) ||  <  1  —  ex  and  ||nyn||  <  1-ey.  Assume  0  <  ex,  cy  <  1- 
Then  for  any  0  <  p  <  1,  \\pX  +  (1  —  p)Y ||  <  1.  Specifically, 


II pX  +  (1  -  p)Y ||  <  1  -  —  mm(pex,l  ~  p). 


Setting  p  =  1/(1  +  ex),  ute  obtain 


\\pX  +  (1  -  p)Y\\  <  1 


£x£y  <  i  _  ex£Y 
12(1  + ex)  -  24~ 


(30) 


(31) 


The  Lemma  is  proved  in  Appendix  1.  We  apply  the  Lemma  by  taking  X  =  EfLi  P(s )  <8> 
P(s)  —  <pi,  Y  =  Z®Z*  —  ipi  and  n  =  tp2.  Then  plugging  ex  =  e  and  ey  =  1  —  1/(N  —  1)  >  1/2 
into  (31)  completes  the  proof  of  Theorem  4.  j| 


3  Quantum  Tensor  Product  Expanders 

In  this  section  we  define  quantum  tensor  product  expanders  and  show  that  random  unitaries 
provide  a  way  of  constructing  tensor  product  expanders.  We  begin  with  some  preliminaries 
and  definitions,  present  applications  to  the  Solovay-Kitaev  problem  of  approximating  unitaries 
by  a  string  of  elementary  operations,  and  finally  prove  that  random  unitaries  give  tensor 
product  expanders.  The  proof  of  this  last  statement  begins  in  subsection  3.3;  it  closely 
follows  [19]  and  should  be  read  in  conjunction  with  that  paper. 
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3.1  Preliminaries,  Definitions,  and  Applications 

Suppose  we  have  a  collection  of  unitaries  {17(1),... ,17(1))}  6  IAn  .  Define  a  quantum  operation 
£k  that  applies  ?7(s)®fc  for  s  6  {1, . . . ,  D }  chosen  uniformly  at  random.  In  other  words 

1  D 

£k(M)  =  D  E  U(S)®kM(U(S)')®k,  (32) 

S—  1 

where  M  is  an  Nk  x  Nk  matrix.  Since  an  Nk  x  Nk  matrix  can  also  be  viewed  as  an  N2k- 
dimensional  vector,  we  can  also  interpret  £t  as  a  linear  operator  on  an  TV2fc-dimensional  vector 
space.  Define  this  operator  to  be 

1  D 

£k:=jjYlUA®k®(U(sr)m-  (33) 

S=1 

Note  that  £k  and  £k  are  isospectral. 

In  previous  work[19,  4,  5]  £\  was  said  to  be  a  (TV,  Z),  A)  quantum  expander  if  the  second- 
largest  eigenvalue  of  £2  was  <  A.  In  fact,  the  definition  of  quantum  expanders  included  even 
quantum  operations  that  were  not  mixtures  of  unitaries,  as  long  as  they  could  be  expressed 
using  <  D  Kraus  operators.  Here  we  will  change  notation  from  [19,  4,  5]  slightly.  We  say  that 
a  set  of  unitaries  (17(1), . . . ,  17(D)}  is  a  (TV,  D,  A,  fc)  tensor  product  expander  if  the  operator 
£k  has  F,'¥  (defined  below)  eigenvalues  equal  to  one,  and  all  of  its  other  eigenvalues  have 
absolute  value  <  A.  This  differs  from  the  notation  of  [19,  4,  5]  in  that  the  set  of  unitaries, 
rather  than  the  quantum  operation,  constitutes  the  quantum  expander!’  When  TV  and  D  are 
understood,  we  sometimes  simply  say  that  {17(1), ... ,  17(D)}  are  a  fc-tensor  product  expander 
with  gap  1  —  A. 

We  define  F 'J?  to  be  the  rank  of  the  projector 


%:=  (U*)®kdU 

JueUn 

or  equivalently  of  the  operation  7fc,  which  is  defined  by 

Tfc(M)  =  [  U®kNl(U])®k.  (34) 

JueUn 

(Throughout  the  paper  the  integration  measure  dV  will  be  the  Haar  measure.)  This  map  is 
the  “twirling”  operation[2].  Since  71-  is  a  Hermitian  map  and  7fe(7fc(M))  =  7fc(M),  the  map 
Tk{M)  has  all  eigenvalues  equal  to  zero  or  unity. 

For  7r  6  Sk,  we  define  the  Nk  x  Nk  matrix  Pv(7r)  is  defined  to  be 

N  N 

Pjv(^)  ^  ^  ‘  ‘  ‘  ^  ^  \il ,  •  ■  *  An}  )  •  ■  ■  ?  ^7 r(JV)  I  • 

il  =  1  ik  =  1 

Since  Pjv(7t)  commutes  with  any  matrix  of  the  form  D®fc,  it  follows  that  Tk(P n(k))  = 
£k(P n{k))  =  Pat(tt)  for  any  it.  We  claim  that  the  Pjv(7t)  (and  their  linear  combinations) 

’'One  can  slightly  generalize  this  by  defining  a  set  of  unitaries  and  a  set  of  associated  probabilities  to  be  a 
tensor  product  expander;  however  in  this  paper  we  consider  applying  each  unitary  with  equal  probability 
summing  to  unity. 
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constitute  all  of  the  unit  eigenvalues  of  Sk ■  This  fact  follows  from  Schur-Weyl  duality,  and 
specifically  Thm  3.3.8  of  [24]  which  states  that  Tk(M)  =  M  if  and  only  if  M  is  a  linear 
combination  of  Pv/tt)  operators.  Thus  F^  =  dimS’pan{Pjv(7r)  :  ir  e  Sk}- 

An  important  special  case  is  when  N  >  k.  In  this  case,  the  set  {Pjv(7t)|1,  2, . . . ,  k)  :  7r  6 
iSfc}  is  linearly  independent,  which  implies  that  (Pjv(7t)  :  7r  6  5^}  is  linearly  independent  and 
thus  that  =  k\. 

In  the  quantum  case,  tensor  product  expanders  give  us  a  way  to  approximate  the  twirling 
operator  71-  of  [2].  This  is  because 


|l4m~Tfc||oc<  a™,  (35) 

so  whenever  A  <  1,  ££°  =  77 .  Let  us  consider  various  other  possibilities  for  implementing 
twirling  as  a  sum  of  different  unitary  transformations:  one  approach  to  exactly  implementing 
the  twirling  operation  is  to  use  f-designs[l],  but  the  number  of  unitaries  that  must  be  im¬ 
plemented  in  this  case  grows  with  N.  Another  approach  was  discussed  in  [20],  which  avoids 
having  the  number  of  unitaries  grow  in  N,  but  requires  the  ability  to  implement  a  number 
of  unitaries  growing  linearly  in  the  logarithm  of  the  error  of  the  approximation.  In  contrast, 
tensor  product  expander  require  only  the  ability  to  implement  a  constant  number  of  unitaries 
to  get  arbitrarily  good  approximations.  This  is  a  definite  advantage:  however,  in  practice,  our 
construction  of  tensor  product  expanders  here,  which  relies  on  the  ability  to  construct  random 
unitary  operations,  probably  cannot  be  efficiently  implemented  using  gates;  instead,  we  would 
like  to  efficiently  implement  a  deterministically  constructed  tensor  product  expander.  This 
raises  the  interesting  question  of  whether  the  constructions  of  [5]  can  lead  to  tensor  product 
expanders  also. 

The  limit  of  large  k:  The  situation  when  k  is  large  has  some  similarities  to  the  classical 
case.  It  still  holds  that  any  (TV,  D,  A,  k)  quantum  tensor  product  expander  is  also  a  (Ar,  D,  A,  k') 
quantum  tensor  product  expander  for  all  k'  <  k.  In  particular,  if  a  set  of  unitaries  forms  a 
(N,  D,  A,  oo )  quantum  tensor  product  expander  than  it  is  also  a  (TV,  D ,  A,  k)  quantum  tensor 
product  expander  for  any  finite  k.  This  is  equivalent  to  generating  a  Cayley  graph  expander 
on  Un-  One  difference  between  the  quantum  and  classical  cases  is  that  there  is  no  upper 
bound  to  the  size  of  irreps  of  Un,  like  there  is  for  Sn- 

Note  that  constant  degree  Cayley  graph  expanders  are  known  for  U-2',  indeed,  choosing  the 
matrices  at  random  will  yield  an  expander  with  probability  one[26].  However,  no  proof  of  this 
fact  is  known  for  N  >  2. 

3.2  Solovay-Kitaev  gate  approximation 

One  application  of  tensor  product  expanders  is  to  the  problem  of  approximating  an  arbi¬ 
trary  V  €  Un  with  a  string  of  gates  from  a  fixed  universal  set  {U(l), . . .  ,U(D)}.  The 
fact  that  {[/(l), . . . ,  U(D)}  is  universal  means  that  (17(1), ...,  U{D))  is  dense  in  Un  (op¬ 
tionally  neglecting  an  overall  phase).  This  means  that  for  any  V  6  Un  and  any  e  >  0, 
there  exists  a  string  si, . . .  ,sm  such  that  U(si)U(s2)  ■  ■  ■  U{sm)  is  within  a  distance  e  of  V . 
Often  we  also  want  to  know  (a)  how  quickly  m  grows  with  1/e  and  (b)  how  long  it  takes 
to  find  si, ...  ,sm.  When  {U ,U  ( D )}  contain  their  own  inverses,  the  Solovay-Kitaev 
theorem[21]  gives  a  poly  log(l / e)  time  (for  fixed  N)  algorithm  to  find  an  e-approximation  with 
m  =  O(log3+o(1'(l/e)).  Very  little  is  known  in  the  case  without  access  to  inverses,  except 
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that  i7(s)t  can  be  simulated  to  error  e  using  0(l/eN2)  applications  of  U(s),  meaning  that  the 
Solovay-Kitaev  construction  can  be  used  with  this  amount  of  overhead. 

Turning  to  lower  bounds,  observe  a  ball  of  radius  e  in  Un  has  volume  0(6^  ).  This  im¬ 
plies  that  to  approximate  all  strings  to  within  error  e  requires  f2((l/e)JV  )  different  unitaries, 
or  equivalently  a  fi(./V2  log  1/e)  string  length.  A  long-standing  open  question  is  whether  the 
Solovay-Kitaev  approximation  can  in  general  be  improved  to  use  the  optimal  0(log  1/e)  num¬ 
ber  of  gates.  Such  optimally  short  approximations  are  known  to  exist  whenever  a  particular 
random  walk  on  IAn  has  a  gap[22]:  specifically,  the  walk  consisting  of  multiplying  by  U(s)  for 
s  randomly  chosen  from  1, . . . ,  D.  For  1A2,  it  was  recently  proven  that  generic  U(  1), . . . ,  U ( D ) 
are  gapped[23]  and  thus  yield  short  approximating  strings.  However,  the  situation  for  lA/v  for 
N  >  2  remains  open. 

In  this  section  we  will  prove  that  when  fc  is  sufficiently  large,  unitaries  forming  fc-tensor 
product  expanders  yield  optimal  0(N2  log  l/e)-length  e- approximations  for  any  gate  in  IAm- 

Theorem  5  .  Suppose  {17(1), ...  ,U(D)}  form  a  k-tensor  product  expander  with  gap  1  — A 
/or  k  Then  for  any  V  6  Un  there  exists  a  string  S\, . .  ■ ,  sm  6  {1, . . . ,  D}  with 

m  =  0(N2  log1/A(l/e))  and  d(V,U(s1)U(s2)  ■  ■ -U(sm))  <  e. 

Here  we  define  the  distance  between  two  unitaries  d(U,  V)  by 

d(U,  V)  =  min  \\U  -  e^Vh  =  2N  -  2\trUW\, 

4>£[0,2n] 


so  that  it  ignores  overall  phase. 

The  main  result  from  [22]  can  be  thought  of  a  weaker  version  of  Theorem  5:  it  requires 
k  =  o o  to  achieve  the  same  conclusion.  Unfortunately,  Theorem  6  only  shows  that  generic 
sets  of  unitaries  are  fc-tensor  product  expanders  for  fc  ~  N1/6 / \og(N).  Thus,  at  present  the 
existence  of  expanders  satisfying  the  assumptions  of  Theorem  5  is  a  nontrivial  conjecture.  It 
is  possible  that  there  exists  some  strengthening  of  the  results  of  Theorem  6  which  will  allow 
us  to  show  that  generic  unitaries  fulfill  the  assumptions  of  Theorem  5. 

Proof  of  Theorem  5:  Let  1$)  =  -7=  ^,—1  K)N)  be  the  maximally  entangled  state  on 

CN  ®  CN .  Define  p(U)  =  [(U  ®  I)&(U"*  ®  J)]0fc.  Observe  that 

tr  p(U)p(V)  =  |  tr  U^V\2k/N2k  =  (l  -  d{^^j  (36) 

Let  Bej3  be  the  ball  of  radius  e/3  around  the  identity:  Be/3  =  {U\d(U,  I)  <  e/3}.  Let 
Vol(e/ 3)  denote  the  volume  of  Be/3  =  0((e/ 3)Ar2).  Define 


Similarly  we  define 


Pe(U)  =  1  /  P(VU)  dV, 

Vol{e/ 3)  JVeb€/ 3 

Ph=  [  P(V)  dV. 

JveuN 


(37) 


(38) 


These  states  are  normalized  so  that  tr  pt{U)  =  tr  pn  =  1.  Since  p(V)  >  0  for  all  V,  we  have  the 
operator  inequality  pe{U)  <  pn /Vol(e/3)  for  any  U.  Also  observe  that  pn  =  (' Tk®id%k)(p(U )) 
for  any  U ,  where  idjv  denotes  the  identity  operation  on  N  x  N  density  matrices. 
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We  will  find  it  convenient  to  think  of  density  matrices  as  vectors  with  the  Hilbert-Schmidt 
inner  product  (A,  B)  =  tr  A^B.  In  this  picture  7 it  is  a  projector,  and  so 

tr  pe{U)pH  =  tr pe(U){Tk  ®  id%k)(pe(U))  =  tr  p2H. 

To  bound  tr  p2H,  observe  that  the  support  of  pn  lies  within  Span{\ip)®k  :  \ip)  6  CN~},  which 
(according  to  [25,  24])  has  dimension  TV 2  +  k  —  17V2  =  k(k  +  1)  •  •  •  (k  +  TV2  —  1)/7V2!  <  kN~ . 
Thus  tr  p2H  >  k~N~ . 

Now  we  use  the  fact  that  ||£™  —  71-llcx,  <  Am  together  with  Cauchy-Schwartz  to  bound 

/  \m  \  i  i 

tr  pe(I)£m(pe{U))  >  tr  pe(I)pH  ~  Amtr  pe(I)2  >  tr  p2H  (l  -  y^/^J  ^  3  trp^  - 

(39) 

where  in  the  second-to-last  step  we  have  assumed 

m  >  log(2/W o7(e/3)2)/  log(l/A)  =  0{N2  log1/A(l/e)). 

On  the  other  hand,  if  there  is  no  string  si, ...,  sm  such  that  d(U(si)U(s2)---U(sm),  U)  <  e, 
then 

tr pe(I)£™(pe(U))  <  (l  -  ^)2k  <  e-w.  (40) 

If  k/logk  N3/e  then  (39)  and  (40)  cannot  simultaneously  hold.  Therefore  there  must 
exist  at  least  one  string  si, . . . ,  sm  for  which  d(U(si)U(s2)---U(sm),  U)  <  e.  §j 

3.3  Trace  Method  and  Schwinger- Dyson  Equations 

The  next  three  sections  are  devoted  to  the  expansion  properties  of  randomly  chosen  unitaries. 
Recall  that  we  would  like  to  construct  a  quantum  tensor  product  expander  by  randomly 
choosing  U(  1), . . . ,  U(D)  6  Un-  There  are  two  cases.  In  the  non-Hermitian  case,  the  unitary 
matrices  U(s)  are  chosen  independently  with  the  Haar  measure.  In  the  Hermitian  case,  D 
is  even  and  the  unitary  matrices  U(s)  for  s  =  1, . . . ,  D/2  are  chosen  independently  with  the 
Haar  measure  and  U(s  +  D/2)  =  U(s)^,  so  that  £k  is  a  Hermitian  operator.  We  focus  on  the 
Hermitian  case,  and  the  techniques  can  be  readily  extended  to  cover  the  non-Hermitian  case. 
Our  main  result  is  that  for  random  U(s),  with  high  probability  we  do  indeed  get  a  tensor 
product  expander: 

Theorem  6  .  Let  {1/(1), . . . ,  U(D/ 2)}  be  chosen  randomly  with  the  Haar  measure  from 
the  unitary  group  Un,  o,nd  let  U(s  +  D/2)  =  U(s)U  Let  k  <  C9(7V1/6/log(7V))  and  let  A 
denote  the  F ^  +  1st  eigenvalue  of  £^  as  defined  in  (32).  Then,  for  any  c  >  1, 

Pr  [A  >  c(l  +  C9(/clog(7V)7V_1/6)Afl-j  <  c-(1/4fc)/v1/6;  (41) 

where  A h  depends  on  D  and  is  given  in  Eq.  (If). 

We  use  a  trace  method  to  bound  the  eigenvalues  of  £t{M).  We  have 

(M(*i.ii)®M(*2,72)®—  ®M(ik,jk),£‘^{M{i1,j1)® 

ii,t2,---,ik  ji,j2,  —  ,jk 

®M(t2,j2)®.,.®M(ifc,jfc))) 

N2k 

=  ^iAar>fc!  +  iAr, 

a= 1 


(42) 
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where  we  pick  m  to  be  an  even  integer.  We  will  derive  bounds  on  the  expectation  value  of 
the  trace  to  bound  the  expectation  of  |A|m.  Eq.  (42)  can  be  re-written  as 

1  D  D  D 

<  g'EE-E  tr({7  (sm  +  D/2)...U(s2  +  D/ 2)U(Sl  +  D/2))k 

Si  =  1  S2  =  l  Sm  =  l 

tr{U  {s\)U(s2)...U (sm))fc].  (43) 

Let  E[...]  denote  the  average  over  the  unitary  group.  Averaging  Eq.  (43)  we  find 

m  D  D  D 

Ei,k  =  (p)  m  EE-E  Sm)  >k\  +  S[|AH,  (44) 

Si  =1  S2  =  l  Sm  =  l 


E0,k(su  Sm)  =  E[tT(U\Sm)-UHs2)U\Sl))ktliU(s1)U(S2)...U(Sm))k} 
=  E[tx(U(sm  +  D/2)...U(s2  +  D/2)U{Sl  +  D/2))k 
tr(U(Sl)U(s2)...U(Srn))k]. 

As  in  [19],  we  write  the  average  in  Eq.  (45)  as  an  average  of  the  form 

E[LiL2...Lc\, 

where 

Lx  =  tr(U(shl)U(sh2)...U(Shmi)),  L2  =  tr  {U(s2,i)U(82,2).-U(s2,m2)),  ... 


(45) 

(46) 

(47) 


Here  we  have  an  average  of  c  traces,  each  of  which  is  a  product  of  some  number  of  unitary 
matrices.  In  particular,  Eq.  (45)  has  c  =  2k,  with  Li  =  L2  =  ...  =  =  L'k+l  =  ...  =  L2k. 

The  Schwinger-Dyson  equations  for  a  product  of  this  form  are[19]: 


E[tr(U(shl)U{sli2)...U(shmi))L2...Lc] 


(48) 


1  _mi. 

=  -^E^.1,Sl.^[ti'(^(si,i)...t/(Siq-i))tr(C/(Siq)-^(si,mi))A2...Lc] 


J=2 


SlJ  +  D/2 


E[tv{U(sh2)...U(s1J-1))tT(U(sj+hl)...U(sltmi))L2...Lc] 


J=2 


-^EE  5s1,1,sUE[tT(U(sljl)...U(shmi)U(sij)U(sitj+1)...U(siij-.i)) 


1=2  j=l 


L2...Li=±Li+i...Lc\ 


4  EE  j+D/2E[tT(U(slt2)...U(shmi)U(sij+1)U(sij+2)...U(sij-1)) 


1=2  3  =  1 


L2...Ll.1Ll+1...Lc]. 


Note  that  in  the  above  equation  an  expression  like  U(sij+i)U(sij+2)...U(sij-i)  means 
U  (siJ+1)U  (siJ+2)...U  (si,mi)U  (s;,i)E/(si,2)  ...U{sij-!) . 
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Our  general  algorithm  for  reducing  traces  starts  by  canceling  all  pairs  of  matrices  U ( s)U(s+ 
D/2)  appearing  successively  in  the  same  trace,  and  replacing  tr (/)  by  N.  We  then  apply 
Eq.  (48),  repeating  the  cancellation  of  successive  U (s)U (s  +  D /2)  and  replacement  of  tr (/)  by 
N  on  each  iteration.  A  term  terminates  at  a  given  level  n  if  there  are  no  matrices  left  after 
n  iterations. 

Let  rrii  be  the  length  of  the  trace  after  canceling  successive  U(s)U(s  +  D/2)  before  any 
iterations;  on  every  successive  iteration,  the  length  of  the  first  trace,  m i,  is  bounded  by  m®. 
As  in  [19],  the  number  of  different  choices  of  si, ...,  sm  which  give  rise  to  a  given  m\  is  bounded 
by 

(D  -  1  )m°/2(D  -  i)m/22™  (49) 

This  number  is  equal  to  Dm  times  the  probability  that  a  random  walker  on  a  Cayley  tree 
arrives  at  a  distance  from  the  starting  point  after  a  walk  of  m  steps.  This  number 
is  independent  of  the  particular  values  of  Sip, ...,  sl  mo.  There  are  [D/(D  —  1)](£>  —  l)”*1 
different  possible  values  of  Sip, ...,  mo  and  therefore  the  total  number  of  choices  of  Sj, ...,  sm 
which  give  rise  to  a  given  choice  of  siti, ...,  Sj  mo  is  bounded  by 

T)  _  1  /  1  \  m? 

—(ymr)  (c-ir/a2"  iso) 

The  number  of  terms  terminating  at  the  nth  level  is  bounded  by 

(2 km  -  l)n.  (51) 

To  see  this,  note  that  at  each  iteration  of  the  Schwinger-Dyson  equation,  the  number  of  terms 
on  the  right-hand  side  is  bounded  by  the  number  of  matrices  on  the  left-hand  side  minus  one. 
Initially,  there  are  2km  matrices,  and  this  number  does  not  increase  under  Eq.  (48). 

We  can  estimate  the  value  of  a  term  which  terminates  at  a  given  level  n  >  1  as  follows. 
First,  there  is  a  sign  equal  to  plus  or  minus  1.  Next,  there  is  a  factor  of  (1  /N)n .  Finally,  there 
is  a  factor  of  N  for  each  trace  of  the  form  tr  (!)  that  appeared  in  this  process.  Suppose  there 
are  p  such  traces,  giving  a  factor  of  Np.  How  big  can  p  be?  Initially  we  have  c  =  2k  different 
traces.  The  given  term  at  level  n  arose  from  a  specific  choice  of  terms  on  the  right-hand  side 
of  Eq.  (48)  on  the  first  iteration.  This  specific  choice  has  k\  different  traces  in  it,  with  fcj 
equal  to  either  k  —  1  or  k  +  1.  After  the  second  iteration  there  are  A)2  traces,  then  k$,  and  so 
on.  The  number  of  traces  k-2:  k:i, ...  can  be  determined  as  follows:  an  application  of  Eq.  (48) 
may  increase  the  number  of  traces  by  one  if  the  term  arises  from  the  first  or  second  line  on 
the  right-hand  side,  or  may  decrease  the  number  of  traces  by  one  if  the  term  arises  from  the 
third  or  fourth  line  on  the  right-hand  side  of  Eq.  (48).  Next,  some  of  the  traces  may  be  trivial, 
being  equal  to  tr (I).  In  the  event  that  the  term  arose  from  the  first,  second,  or  third  line  of 
Eq.  (48)  it  is  not  possible  for  any  of  the  traces  to  be  trivial,  under  the  assumption  that  any 
repetitions  of  the  form  U(s)U(s  +  D/2)  have  been  previously  replaced  by  I  in  the  trace  on 
the  left-hand  side  of  the  equation.  However,  in  the  event  that  the  term  arose  from  the  fourth 
line,  then  it  is  possible  for  one  of  the  traces  to  be  trivial,  increasing  p  by  one.  Thus,  for  each 
b  <  n,  kb  —  kb- 1  is  equal  to  either  +1,  —1,  or  —2.  Let  q  be  equal  to  the  number  of  times  the 
first  or  second  line  was  used  from  Eq.  (48)  and  n  —  q  equal  the  number  of  times  the  third  or 
fourth  line  was  used.  Then,  in  order  for  all  traces  to  be  trivial  in  this  particular  term  resulting 
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from  n  iterations  of  Eq.  (48), 

2k  +  q  —  (n  —  q)  —  p  =  0.  (52) 

Also,  since  p  can  only  increase  when  a  term  from  the  fourth  line  is  used, 

p  <  n  —  q.  (53) 


Thus, 


P  <  L(2 k  +  n)/3j  . 


(54) 


Therefore,  the  value  of  a  term  terminating  at  the  nth 
value  by 

jyl(2k+n)/3\-n ' 


level,  n  >  0,  is  bounded  in  absolute 

(55) 


Note  that  if  >  0  then  there  are  no  terms  terminating  at  level  n  with  n  <  k,  so  for  rrij  =  0, 
the  trace  is  equal  to  N2k,  while  for  m?  >  0,  the  terms  are  bound  in  absolute  value  by  N° 
(this  bound  is  only  reached  if  k  =  n). 

Eq.  (48)  generates  an  infinite  series,  whose  nth  term  is  the  sum  of  all  terms  terminating 
at  level  n.  As  in  [19],  this  series  is  absolutely  convergent  for  2 km  <  N.  In  fact,  the  following 
stronger  claim  holds:  Eq.  (48)  generates  an  absolutely  convergent  series  for  2 km  —  1  <  N 
which  converges  to  the  expectation  value  of  the  trace.  To  see  this,  note  that  the  value  p  above, 
the  number  of  traces  of  /,  is  always  bounded  by  2 km.  Thus,  the  value  of  a  term  terminating 
at  the  nth  level  is  bounded  by 

N2kmN-n_  (56) 


Depending  on  n,  sometimes  (55)  gives  a  better  bound  and  sometimes  (56)  gives  a  better 
bound,  but  to  estimate  convergence  we  will  use  (56).  Eq.  (51)  shows  that  the  number  of 
terms  terminating  at  level  n  is  bounded  by  (2km  —  1)™.  Thus,  the  absolute  value  of  the  sum 
of  terms  terminating  at  level  n  is  bounded  by  N2krn((2km  —  1  )/N)n,  and  so  for  2km  —  1  <  A7", 
the  series  is  absolutely  convergent.  Further,  a  term  which  has  not  terminated  at  the  nth  level 
contains  at  most  2 km  traces  in  it,  and  hence  is  bounded  in  absolute  value  by  N2km(l/N)n. 
Therefore,  the  sum  of  all  terms  which  have  not  terminated  at  the  nth  level  is  also  bounded 
by  N2km((2km  —  1  )/N)n),  and  hence  for  2 km  —  1  <  N  the  series  converges  to  the  average  of 
the  trace. 


3.4  Example 

We  now  work  out  a  simple  example  to  give  some  idea  of  the  use  of  the  Schwinger-Dyson 
equations.  This  example  will  also  be  used  later  in  the  idea  of  “complete  rung  cancellation” 
and  gives  intuition  behind  the  claim  that  for  N  >  k  we  have  k\  eigenvalues  equal  to  unity. 
Let  the  matrix  X  be  chosen  from  the  unitary  group  with  the  Haar  measure  and  evaluate  the 
expectation  value  for  N  >  k 

E[(tr(A)tr(A't))"].  (57) 

For  k  =  1,  a  single  application  of  Eq.  (48)  shows  that  this  is  equal  to  unity.  For  k  =  2,  we 
hnd 


E 


(tr(X)tr(Xt))2 


2 E  (tr(A)tr(At))  -  (1  /N)E  tr(XX)(tr(At)2)  (58) 
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=  2  +  (l  /NfE 
=  2  +  (l/Nj2E 


(tr(X)tr(Xt)) 

^tr(X)tr(X+) 


2(1  /NfE  [tr(X)tr(X+)] 
2(1  /N)2. 


For  N  >2,  this  shows  that  -E[^tr(X)tr(Xt)^  ]  =  2. 

It  is  interesting  to  see  what  happens  to  the  expectation  value  in  Eq.  (58)  for  N  =  1,  fc  =  2. 
Then,  the  last  line  Eq.  (58)  gives  simply  E[^tr(X)tr(AU)^  ]  =  £[^tr(A^)tr(A't)^  ],  giving 
no  information  about  the  trace.  For  general  N,  the  sum  of  terms  terminating  at  level 
1  is  equal  to  zero,  while  the  sum  of  terms  terminating  at  levels  2,3, 4, 5,6...  is  equal  to 
2,  —  2/N,  2/N,  —  2/N2,  2/N2, ...  respectively.  Thus,  we  do  not  have  a  convergent  series  for 
N  =  l,fc  —2. 

Up  to  now  we  have  considered  the  series  whose  nth  term  is  the  sum  of  terms  terminating 
at  a  given  level  n.  We  now  consider  instead  the  expectation  value  of  Eq.  (57)  as  a  series  in 
1/N.  For  N  >  fc,  this  series  is  again  absolutely  convergent  to  the  desired  expectation  value. 
It  is  easy  to  see  that  for  arbitrary  fc ,  and  for  N  fc,  the  expectation  value  (57)  is  equal  to 
k\  +  0(1/N),  as  there  are  k\  terms  which  terminate  at  level  k.  We  now  show  that  for  N  >  k, 
the  expectation  value  (57)  is  equal  to  fc!  exactly.  Note  that  the  expectation  value  in  Eq.  (57) 
is  equal  to  the  trace  of  the  map  Tk  (defined  in  (34)) 

Thus,  the  trace  of  the  map  Tk{M)  is  equal  to  the  number  of  unit  eigenvalues  of  Tk{M). 
For  N  >  k  the  trace  of  this  map  can  then  be  written  as  the  sum  of  an  infinite  series  in  1/JV, 
and  using  the  fact  that  the  number  of  unit  eigenvalues  is  equal  to  an  integer  for  all  integer  N, 
we  find  that  all  terms  in  the  series  in  1/JV,  beyond  the  term  of  order  N°,  must  vanish  exactly 
(the  calculation  above  represents  an  explicit  check  of  this  for  k  =  2  and  it  may  be  readily 
verified  for  any  fc).  Thus,  for  all  N  >  fc,  the  expectation  value  of  Eq.  (57)  is  equal  to  fc!.  This 
gives  an  alternate  proof  that  =  fc!  when  N  >  fc. 

3.5  Counting  and  Main  Result 

In  this  section  we  prove  a  bound  on  the  expectation  value  of  the  sum  in  Eq.  (44),  which  will 
give  us  a  bound  on  the  expectation  value  of  the  mth  power  of  A,  proving  the  theorem.  The 
next  three  paragraph  are  devoted  to  outlining  the  basic  idea  of  the  proof,  before  beginning 
the  technical  details. 

The  basic  idea  of  the  proof  is  to  prove  the  bound  on  the  sum  by  proving  a  bound  on  the 
number  of  different  choices  of  Si, ...,  sm  such  that,  when  the  resulting  trace  is  evaluated  using 
the  Schwinger-Dyson  equations,  there  is  a  term  which  terminates  at  level  n,  for  any  given  n. 
We  give  this  bound  on  the  number  of  choices  of  si, ...,  sm  in  Eq.  (61).  We  then  combine  this 
bound  with  a  bound  on  the  contribution  to  the  trace  of  terms  which  terminate  at  level  n. 
The  idea  is  that  there  are  a  only  small  number  of  choices  of  Si,...,sm  which  produce  terms 
which  terminates  at  a  small  level  n,  and  while  there  are  a  large  number  of  choices  of  Si, ...,  sm 
which  produce  terms  which  terminate  at  high  levels,  such  terms  are  small. 

One  technical  caveat  in  this  work  is  that  for  any  choice  of  si, ...,  sm  there  will  be  certain 
terms  which  terminate  at  a  low  level  n.  These  are  terms  in  which  we  use  the  Schwinger-Dyson 
equations  to  contract  U(si)  in  one  trace  with  E/(s,)'  in  a  different  trace.  If  for  some  i,  we 
contract  all  unitaries  U(si )  in  this  way,  we  have  what  is  called  a  “complete  rung  cancellation” 
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below.  We  consider  such  terms  separately,  and  they  are  responsible  for  producing  the  leading 
order  expectation  value  of  the  trace  in  1/N:  these  terms  sum  to  give  a  contribution  fc!  to  the 
expectation  value  of  the  trace,  precisely  corresponding  to  the  expectation  we  expect  from  the 
unit  eigenvalues. 

Ignoring  those  terms  with  complete  rung  cancellations,  we  see  that  a  term  in  the  Schwinger- 
Dyson  equations  must  involve  contracting  J7(s* )  with  U(sj)  or  U(sj)^  for  some  z  ^  j.  Such 
terms  involve  constraints:  such  a  term  would  require  that  either  s,  =  Sj  +  D/2  or  s*  =  Sj.  In 
order  for  such  a  term  to  terminate  at  a  low  level,  there  must  be  many  such  constraints,  and 
this  is  why  there  are  only  a  few  choices  of  si, ...,  srn  which  produce  terms  which  terminate  at 
low  levels.  To  show  precisely  that  there  are  only  a  few  such  choices  of  Si, ...,  sm,  we  follow  a 
different  strategy.  To  explain  this  strategy,  suppose  you  knew  a  choice  of  Si, ...,  sm  which  gave 
rise  to  a  term  which  terminated  at  some  level  n  and  you  were  given  the  task  of  explaining  to 
someone  which  choice  of  si, ...,  sm  you  used.  One  way  to  do  this  would  be  to  simply  list  the 
m  different  values  of  s.  This  would  require  communicating  log2(Dm)  bits.  We  instead  show 
how  to  uniquely  specify  the  choices  of  Si, ...,  sm  in  a  different  way,  by  specifying  most  of  the 
choices  of  si, ..,  .sm  by  describing  which  cancellations  were  used.  For  small  n,  this  will  allow 
one  to  communicate  the  specific  choice  of  Si,..,sm  in  much  shorter  way,  thus  implying  that 
that  there  are  only  a  few  choices  of  si, ...,  sm  which  produce  the  desired  term  terminating  at 
level  n.  We  now  put  this  idea  into  practice. 

On  a  given  iteration  of  the  Schwinger-Dyson  equations,  we  go  from  a  product  of  c  traces 
to  a  product  of  c  +  1,  c  —  1,  or  c  —  2  traces.  As  in  [19],  we  keep  track  of  how  the  matrices 
move  under  this  iteration  process  using  a  function  /„((/,*))  from  pairs  of  integers  to  pairs 
of  integers.  We  say  that  the  matrix  U(si^)  in  the  given  product  of  traces,  LiZ/2...Lc,  is  in 
position  ( l,i ).  Let  us  consider  the  case  of  a  term  on  the  first  line,  where  c  increases  by  one. 
Then,  for  any  given  j  in  the  sum  on  the  first  line,  we  say  that  the  matrix  in  position  (1,  z),  for 
i  <  j  on  the  n  +  1st  iteration  corresponds  to  the  matrix  in  position  (1  ,  z)  on  the  nth  iteration, 
and  so  /„((1,  z))  =  (1,  z),  while  the  matrix  in  position  (2,  z)  on  the  n+lst  iteration  corresponds 
to  the  matrix  in  position  (l,z  |  j  -  1)  on  the  nth  iteration,  so  /„((l,z  +  j  —  1))  =  (2,  z).  The 
matrix  in  position  for  2  <  J  <  A  +  1  on  the  n  +  1st  iteration  corresponds  to  the  matrix 
(l  —  l,z)  on  the  nth  iteration,  so  /„(/  —  l,z)  =  We  follow  a  similar  procedure  for  the 

other  lines  of  Ecp  (48)  and  if  there  are  cancellations,  we  keep  track  of  how  the  matrix  moves 
under  the  cancellations. 

We  then  keep  track  of  which  matrix  after  n  iterations  corresponds  to  a  given  matrix  before 
any  iterations,  by  defining  Fn((l,i))  =  for  l  =  1,2,  ...,2k.  Let  us  say  that 

the  matrix  at  position  (l,  i)  is  “trivially  moved”  under  the  nth  iteration  of  the  Schwinger- 
Dyson  equations  if  if  it  is  not  in  either  position  (1, 1)  or  position  (1  ,j)  using  a  term  on  the 
first  or  second  line,  or  in  either  position  (1, 1)  or  position  (l,j)  using  a  term  from  the  third  or 
fourth  line.  If  a  matrix  is  not  trivially  moved,  and  the  matrix  is  not  in  position  (1, 1),  then 
the  Schwinger-Dyson  equations  imply  a  relation  between  and  si^. 

A  given  term  in  Eq.  (48)  arises  from  a  given  choice  of  ( l,j ):  for  a  term  on  the  first  or 
second  line  let  us  say  l  =  1.  Let  (1, 1)  =  Fn(l0,j0)  and  let  (l,j)  =  Fn(l'0,j'0).  If  a  matrix  is 
not  trivially  moved  under  on  the  nth  iteration  then  there  are  two  cases:  (1)  either  lg  <  k  and 
1'0  <  k  or  lg  >  k  and  1'0  >  k.  That  is,  either  both  matrices  appeared  in  one  of  the  first  k  traces, 
which  are  traces  of  products  of  conjugates  of  unitaries,  or  both  matrices  appeared  in  one  of 
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the  last  fc  traces,  which  are  traces  of  unitaries.  Or,  case  (2):  l0  <  k  and  1'0  >  k  or  l0  >  k  and 
l'o  <  fc-  That  is,  one  matrix  was  in  one  of  the  first  k  traces  and  the  other  was  in  one  of  the 
last  k  traces.  We  then  break  the  first  case  into  two  sub-cases:  (a),  jo  =  j'0  or  (6),  jo  7^  j'0.  We 
also  break  the  second  case  into  two  sub-cases:  (a),  jo  =  mi  +  1  —  j'0  or  ( b ),  jo  7^  mi  +  1  —  j’0. 
In  case  la  both  matrices  are  unitary  matrices  U(sij0)  or  both  are  U(s\tjoy  and  in  case  2a, 
one  matrix  is  U(s ij0)  and  the  other  is  U(sij0y.  In  case  16,  we  know  that  S\j0  =  s for 
jo  7^  j o  while  in  case  2b  we  know  that  s ij0  =  s ij’o  +  D /2  for  jo  7^  j'0.  Thus,  in  case  lb  or  2b 
the  term  in  the  Schwinger-Dyson  equation  implies  some  constraint  about  the  choice  of  s ij. 
To  illustrate  these  different  cases,  consider  the  example  (58):  the  first  term  on  the  right-hand 
side  of  the  top  line  is  an  example  of  case  2a,  while  the  second  term  on  the  same  line  is  an 
example  of  case  la. 

Consider  a  given  j;  if  on  some  iteration  and  for  some  l  the  matrix  which  was  originally 
in  position  (l,  j)  is  not  trivially  moved  and  we  have  case  16  or  26,  then  we  can  identify  some 
k  such  that  either  s ij  =  sy*,  or  sjj  =  s  ijk  +  D/2.  Let  us  write  k  =  r(j)  in  both  cases,  for 
some  function  r(j).  We  define  a  term  to  have  a  “complete  rung  cancellation  of  matrix  j”  if  it 
is  not  possible  to  identify  such  a  k  for  the  given  j.  We  claim  that  the  sum  of  all  terms  with  a 
complete  rung  cancellation  of  matrix  i  is  equal  to  fc!  so  long  as  k  <  N.  To  show  this,  consider 
the  product  of  traces 

tr {U{sm  +  D/2)...U(si+1  +  D/2)X^U(Si-i  +  D/2)...U(si  +  D/2))k  x 

xtr(U(si)...U(si-i)XU(si+i)...U(sm)k ,  (59) 

where  X  is  some  arbitrary  unitary  matrix.  Averaging  this  trace  over  all  unitary  matrices  U (s) 
and  over  all  unitary  matrices  X  with  the  Haar  measure,  we  find  that  the  trace  is  equal  to  fc!: 
this  can  be  established  by  applying  Eq.  (48)  to  this  trace,  and  always  cyclically  permuting 
the  trace  so  that  X  is  in  the  first  position.  This  calculation  is  very  similar  to  the  example 
calculation  (57)  above.  However,  applying  the  Schwinger-Dyson  equations  to  the  trace  (59) 
without  first  applying  the  cyclic  permutation  generates  precisely  the  sum  of  terms  mentioned 
above,  those  in  which  there  is  a  complete  rung  cancellation  of  matrix  i.  Thus,  this  sum 
of  terms  equals  fc!.  We  further  claim  that  for  any  given  ii,  *2, ...,  id,  the  sum  of  all  terms 
with  complete  rung  cancellations  of  matrices  ii,i2,  ■■■id  is  equal  to  fc!,  as  may  be  shown  by 
considering  a  trace  in  which  matrices  U(si1),U(si2), ...  are  replaced  by  Xi,  X2, ...,  and  the 
trace  is  averaged  over  the  different  Xi,  X2, ....  Then,  using  the  inclusion-exclusion  principle, 
the  sum  of  terms  in  which  for  no  i  is  there  a  complete  rung  cancellation  of  matrix  i  is  equal  to 
the  sum  of  all  terms  minus  fc!.  So,  we  now  focus  on  the  sum  of  terms  with  no  complete  rung 
cancellations,  which  we  define  to  be  E'0  k(s  1, ...,  sm);  if  a  given  choice  of  si, ...,  sm  gives  rise  to 
a  term  which  terminates  at  level  n  with  no  complete  rung  cancellations,  then  it  is  possible  to 
identify  a  t(i)  for  each  i. 

We  now  follow  the  same  approach  as  in  [19]  to  bound  the  number  of  choices  of  si, ...,  sm 0 
which  can  produce  a  term  which  terminates  at  a  level  n  with  no  complete  rung  cancellations. 
Given  the  sequence  of  choices  of  terms  on  the  right-hand  side  of  the  Schwinger-Dyson  equation 
(48),  as  well  as  knowledge  of  which  cancellations  occurred  at  each  iteration,  we  know  the 
function  r(i),  and  given  this  function  r(i)  there  are  now  only  at  most  [D/(D  —  1)](D—  l)mi/2 
possible  values  of  Sip, ...,  Sj  mo.  Thus,  the  total  number  of  choices  of  Si,...,smo  which  can 
produce  a  term  which  terminates  at  level  n  is  bounded  by  the  number  of  possible  choices 
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of  terms  and  cancellations  in  the  Schwinger-Dyson  equation  (48)  at  each  of  the  n  iterations 
multiplied  by  [D/(D  —  1)](D  —  l)mi/2.  At  each  iteration  of  the  Schwinger-Dyson  equations, 
we  make  a  particular  choice  of  l,  j  at  each  level,  which  requires  specifying  one  particular 
matrix  out  of  all  the  matrices  on  the  right-hand  side;  there  are  at  most  2m\k  —  1  matrices  on 
the  right-hand  side,  so  there  are  at  most  2m\k  —  1  choices  (in  [19],  the  slightly  worse  bound 
(2m.ik  —  l)2  was  found;  we  tighten  the  bound  here).  At  each  such  iteration  of  the  Schwinger- 
Dyson  equations,  there  may  be  cancellations  in  two  different  traces  if  the  term  came  from 
the  second  line  of  Eq.  (48),  with  at  most  m\  cancellations  in  each  trace,  or  cancellations  in 
two  different  places  of  a  single  trace,  if  the  term  came  from  the  fourth  line  of  Eq.  (48),  with 
at  most  mi  cancellations  in  each  place.  Let  us  call  the  number  of  cancellations  Ci ,  C2  with 
0  <  ci  <  mi  and  0  <  C2  <  m\.  Then,  by  specifying  l,j,c i,C2  for  each  iteration,  we  succeed 
in  fully  specifying  how  the  matrices  move  under  the  n  iterations  of  the  Schwinger-Dyson 
equation;  this  requires  specifying  n  numbers  ranging  from  1...2fcmi  —  1,  and  2 n  numbers 
ranging  from  0...mi. 

Thus,  there  are  at  most 

[D/(D  -  1  )](D  -  l)m°/2{2km\  -  1  )n(m?  +  l)2n  <  [D/(D  -  1  )](D  -  l)m°/2(2 fcm?)3rl  (60) 

choices  of  si, ...,  smo  which  can  produce  a  term  which  terminates  at  level  n.  Using  Eq.  (50), 
the  number  of  choices  of  which  can  produce  a  term  which  terminates  at  level  n  is 

at  most 

l)m/22m(2fcrn°)3"  <  (D  —  i )m/22m(2fc^  +  1|3,I+1^  (6l) 

m°  =  0 

For  any  s\ ,  we  define  nmin(si,  sm)  to  be  the  smallest  level  at  which  a  term 

terminates  with  no  complete  rung  cancellations.  The  sum  of  terms  with  ?7ii  =  0,  which  is  the 
same  as  the  sum  of  terms  with  =  0,  is  bounded  by 

N2kD-m{D  -  l)™/22™  =  Af2A™  (62) 


Thus,  we  re-write  the  sum  in  Eq.  (44)  as 


E  i,k 


<  k\  +  N2kN2XS 


oo  D  D  D 

'A.  ■■■  'y^.  dnmin(Slt...tSm)!nEo,k{su---’ sm)- 

n=k  si  =  1  s2  =  l  sm  =  1 


Therefore,  for  any  si, ...,  sm  with  nm in  >  0, 


E'ok(si,...,Sm)  <  Y,  iV2(fc-n)/3(2fcm-l) 

n> Timings i 

V2t/3  [jV~2/3(2fcm  —  l)]”™»" 

!  —  N~2/3(2km  —  1) 


From  Eqs.  (61,63,64), 


(63) 


(64) 


Eiyk  < 


< 


k\  +  \%{N2  +  N2k'3Y 

n=k 

oo 

1  +  X^iyN2  +  7V2fc/3  Y 

n=k 


(2 km  +  l)3n+1  [N~2/3(2km  -  l)]n  } 
3n  +  1  i  -  AT-2/3(2m  -  1)  J 

_ 2km  +  1 _ \  AT— 2/3 

(3 n  +  1)[1  -  N~2/3(2km  -  1)] L 


(2  km  + 
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We  then  pick  m  =  (l/4fc)A^1/6 7 8 9,  so  that  N  2/3(2 km 5  +  l)4  <  1/2  and 

|A  <  (Ehk- l)1/™  <  +  (66) 

=  A  H{l  +  0{\og{N)kN~1/Q). 

Using  Markov’s  inequality,  the  probability  that  |A|  is  greater  than  c(l+0(A;log(-/V)JV-1/6)Azf(-D), 
for  any  c  >  1,  is  bounded  by  c-(1/4fc)^rl/<i  | 

4  Discussion 

We  have  introduced  quantum  and  classical  tensor  product  expanders.  These  provide  a  way  to 
approximate  t-designs  by  acting  many  times  with  a  small  number  of  unitaries.  An  important 
open  question  is  whether  efficient  implementations  of  these  tensor  product  expanders  exist. 
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Appendix  A  Proof  of  Lemma  1 

First,  we  reduce  to  the  case  when  the  matrices  are  2x2  with  II  =  |1)(1|  and  X  is  diagonal. 
Express  \\pX  +  (1  —  p)Y ||  as  the  maximum  of  (ip\pX  +  (1  —  p)Y \ip)  over  all  unit  vectors  | ip). 
Write  |i/>)  as  \ip)  =  cos(f?)|'!/q)  +  sin(0)|t/!2),  where  0  <  9  <  7t/2  and  |^>2)  are  normalized 
vectors  such  that  n|^i)  =  \tp-f)  and  (I  —  n)|t/)2)  =  |f/>2).  Our  conditions  on  X  imply  that 
(ip\X\ip)  =  cos2(0)  +  (tp2\X\ip2)  sin2(f?)  and  that  |(V’2|A'|^2)|  <  1  —  ex-  Next,  for  i,  j  =  1,2 
define  Yjj  =  {ipi\Y\ipj}.  Since  ||Y||  <  1,  we  also  have  that  ||  Yh  j=i  ^i,j'K)(?lll  —  1-  We  can 
now  replace  Y  with  j=i  Li,j|*)0'|  and  X  with  |1)(1|  +  (^>2|A|^>2)  |2)(2|. 

Now  suppose  that  |(i/>|A|t/>)|  >  1  —  exey/12.  Using  our  bound  on  |(t/j2|A^|t/)2)|,  we  obtain 


1  - 


exey 


<  cos 2 (9)  +  sin2(0)(l 


ex)  =  1  -  sin2 (0) ex, 


12 
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implying  that  sin2(0)  <  ey/12.  We  will  show  that  this  yields  an  upper  bound  on 
Since  ||Y||  <  1,  we  have 

Thus 


MY\fl>)\  <  cos2(0)|Yl,i|  +sin(0)cos(0)(|Yi,2|  +  1*2, i|)  +  sin2(0)|Y2,2| 

<  cos(^) | hi ! |  +  em{e)2yjl  -  |YM|2  +  g.  (A.l) 

If  9  were  not  constrained  then  the  first  two  terms  of  (A.l)  would  be  maximized  by  taking  9  to 
be  9  =  arctan(2-\/l  —  |*i,i|2/l*i,il)  >  arctan(2y/2ey  —  e^/(l  —  ey))  >  arctan(2v/2ey).  Using 
sin2(arctan(z))  =  22/(l  +  22),  we  have  sin2(0)  >  8ey/(l +  8ey)  >  ey/2.  Since  9  is  constrained 
to  lie  in  [0,  arcsin( y  ey /12)] ,  it  cannot  equal  9.  Thus  maximizing  (A.l)  will  require  setting  9  to 
one  of  the  endpoints  of  the  allowed  region.  In  particular,  the  maximum  value  of  (A.l)  occurs 
when  sin2(0)  =  ey/12.  A  similar  argument  proves  that  setting  lYyil  =  1  —  ey  maximizes 
(A.l)  as  well.  Now  we  calculate 

MYM  <  (1  -  ey)  +  2^gV2ey-4  +  ^  ^  1  ~  ~  \fl  ~  Y2)  ey  ^  1  ~  %  (A-2) 

We  have  shown  that  for  any  ip,  either  (ip\X\ip)  <  1  —  exey/12  or  (ip\Y\ip)  <  1  —  ey/10.  We 
now  use  the  triangle  inequality  to  bound 

(ip\pX  +  (1  -  p)Y\ip)  <  max(p(l-  +  {1  -  p),p  +  {1  -  p)  (l  -  ^)) 

<  1  ~  min (pex,  1  -p).  (A. 3) 

Since  this  bound  applies  for  all  normalized  | ip),  it  must  also  upper-bound  || pX  +  (1  —  p)Y||. 
Thus  we  obtain  (30).  The  remaining  steps  of  the  Lemma  are  direct  calculations.  ( 
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The  Quantum  Schur  and  Clebsch-Gordan  Transforms: 
I.  Efficient  Qudit  Circuits 

Dave  Bacon*  Isaac  L.  Chuang'and  Aram  W.  Harrow* 


Abstract 

We  present  an  efficient  family  of  quantum  circuits 
for  a  fundamental  primitive  in  quantum  information 
theory,  the  Schur  transform.  The  Schur  transform  on 
n  d-dimensional  quantum  systems  is  a  transform  be¬ 
tween  a  standard  computational  basis  to  a  labelling 
related  to  the  representation  theory  of  the  symmet¬ 
ric  and  unitary  groups.  If  we  desire  to  implement 
the  Schur  transform  to  an  accuracy  of  e,  then  our 
circuit  construction  uses  a  number  of  gates  which 
is  polynomial  in  n,  d  and  log(e_1).  The  key  tool 
in  our  construction  is  a  poly(d,  log  n,  log(e-1))  algo¬ 
rithm  for  the  Ud  Clebsch-Gordan  transform.  Our  ef¬ 
ficient  circuit  construction  renders  numerous  proto¬ 
cols  in  quantum  information  theory  computationally 
tractable  and  yields  a  new  possible  approach  to  quan¬ 
tum  algorithms  which  is  distinct  from  the  standard 
paradigm  of  the  quantum  Fourier  transform. 

1  Introduction 

The  last  decade  has  seen  the  development  and  ex¬ 
pansion  of  a  robust  theory  of  quantum  information[l] 
However  despite  much  progress  in  understanding  op¬ 
timal  rates  for  manipulating  and  transmitting  quan¬ 
tum  information,  many  results  may  not  be  of  practi¬ 
cal  value,  even  if  large-scale  quantum  computers  and 
quantum  communication  networks  could  be  built. 
This  is  because  many  of  the  optimal  protocols  assume 
unbounded  (or  at  least  exponential)  quantum  compu¬ 
tational  resources  for  each  local  party.  An  analogous 
situation  arises  classically,  for  example,  in  the  the¬ 
ory  of  classical  error  correcting  codes,  where  it  can 
be  difficult  to  reconcile  the  goals  of  efficient  commu¬ 
nication  rates  and  computationally-efficient  encoding 
and  decoding. 

While  the  goal  of  performing  classical  coding 
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tasks  in  polynomial  or  even  linear  time  has  long  been 
studied,  quantum  information  theory  results  have 
typically  ignored  questions  of  efficiency.  For  example, 
random  quantum  coding  results  (such  as  [2,  3,  4,  5]) 
require  an  exponential  number  of  bits  to  describe,  and 
like  classical  random  coding  techniques,  do  not  yield 
efficient  algorithms.  There  are  a  few  important  ex¬ 
ceptions.  Some  quantum  coding  tasks,  such  as  Schu¬ 
macher  compression [6,  7],  are  essentially  equivalent 
to  classical  circuits,  and  as  such  can  be  performed 
efficiently  on  a  quantum  computer  by  carefully  modi¬ 
fying  an  efficient  classical  algorithm  to  run  reversibly 
and  to  deal  properly  with  ancilla  systems  [8].  An¬ 
other  example,  which  illustrates  some  of  the  chal¬ 
lenges  involved,  is  Ref.  [9]’s  efficient  implementation 
of  entanglement  concentration[10].  Quantum  key 
distribution[ll]  not  only  runs  efficiently,  but  can  be 
implemented  with  entirely,  or  almost  entirely,  single¬ 
qubit  operations  and  classical  computation.  Finally, 
some  randomized  quantum  code  constructions  have 
been  given  efficient  constructions  using  classical  de- 
randomization  techniques  in  [12]. 

In  this  paper  we  present  an  efficient  fam¬ 
ily  of  quantum  circuits  for  a  transform  used 
ubiquitously [13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23] 
in  quantum  information  protocols:  the  Schur  trans¬ 
form.  Our  efficient  construction  of  the  Schur  trans¬ 
form  adds  to  the  above  list  a  powerful  new  tool  for 
finding  algorithms  that  implement  quantum  commu¬ 
nication  tasks. 

The  Schur  transform  is  a  unitary  transform  on 
n  d-dimensional  quantum  systems  (n  qudits).  The 
basis  change  corresponding  to  the  Schur  transform 
goes  from  a  standard  computational  basis  on  the  n 
qudits  to  a  labelling  related  to  the  representation 
theory  of  the  symmetric  and  unitary  groups;  much 
like  the  Fourier  transform,  it  thus  transforms  from 
a  local  to  a  more  global,  collective  basis,  which  cap¬ 
tures  symmetries  of  the  system.  In  this  article  we 
show  how  to  efficiently  implement  the  Schur  trans¬ 
form  as  a  quantum  circuit.  The  size  of  the  circuit 
we  construct  is  polynomial  in  the  number  of  qudits, 
n,  the  dimension  of  the  individual  quantum  systems, 
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d ,  and  the  log  of  accuracy  to  which  we  implement 
the  transform,  log(e^1).  Our  efficient  quantum  cir¬ 
cuit  for  the  Schur  transform  makes  possible  efficient 
quantum  circuits  for  numerous  quantum  information 
tasks:  optimal  spectrum  estimation[13,  24],  universal 
entanglement  concentration  [14],  universal  compres¬ 
sion  with  optimal  overflow  exponent  [15,  16],  encoding 
into  decoherence- free  subsystems[18,  19,  20,  21],  opti¬ 
mal  hypothesis  testing[17],  and  quantum  and  classical 
communication  without  shared  reference  frames  [22]. 
The  central  role  of  the  Schur  transform  in  all  of 
these  protocols  (as  well  as  others  like  the  quantum 
reverse  Shannon  theorem  [25]  where  other  aspects  of 
the  protocol  remain  inefficient)  is  due  to  the  fact  that 
the  symmetries  of  independent  and  identically  dis¬ 
tributed  quantum  states  are  naturally  treated  by  the 
representation  theory  of  the  symmetric  and  unitary 
groups. 

The  Schur  transform  is  only  defined  up  to  a 
choice  of  the  Schur  basis,  and  a  key  technical  compo¬ 
nent  of  our  algorithm  will  be  the  selection  of  certain 
subgroup-adapted  bases  for  the  Schur  basis.  In  par¬ 
ticular  we  use  the  Gel’fand-Zetlin  basis  [26]  and  the 
Young-Yamanouchi  basis  (sometimes  called  Young’s 
orthogonal  basis)  [27].  The  usefulness  of  subgroup- 
adapted  bases  to  quantum  algorithms  was  recognized 
by  Beals  [28]  and  Moore  and  Russell  [29]  in  their  algo¬ 
rithms  for  efficient  Fourier  transforms  on  nonabclian 
finite  groups.  We  will  similarly  exploit  the  recur¬ 
sive  structure  of  subgroup-adapted  bases  to  build  ef¬ 
ficient  recursive  algorithms  for  the  Clebsch-Gordan 
and  Schur  transforms.  However,  we  emphasize  that 
the  Schur  transform  is  not  equivalent  to  the  Fourier 
transform  over  Sn ,  Ud  or  any  other  group,  while 
connections  between  such  transforms  and  the  Schur 
transform  exist,  and  will  be  discussed  in  part  II  of 
this  paper  (see  also  Chapter  8  of  [30]). 

By  choosing  the  Gel’fand-Zetlin  basis  and  the 
Young-Yamanouchi  basis,  we  are  able  to  show  that 
the  Schur  transform  can  be  constructed  from  a  cas¬ 
cade  of  Clebsch-Gordan  transforms  (in  rough  anal¬ 
ogy  to  the  iterative  constructions  of  [29]).  To  im¬ 
plement  the  Clebsch-Gordan  transform,  we  use  the 
Wigner-Eckart  theorem  and  the  Gel’fand-Zetlin  ba¬ 
sis  to  derive  a  recursive  expression  for  the  d  dimen¬ 
sional  Clebsch-Gordan  transform  in  terms  of  the  d—  1 
dimensional  Clebsch-Gordan  transform  and  small,  ef¬ 
ficiently  implementable,  unitary  transforms.  (The  ef¬ 
ficiency  of  this  reduction  is  reminiscent  of  the  use 
of  adapted  diameter  in  [29],  but  not  directly  related 
since  Ud/Ud-i  is  not  finite.)  The  resulting  recursive 
circuit  for  the  Clebsch-Gordan  transform  can  achieve 
accuracy  e  using  poly (d,  log  n,  log  1/e)  gates  in  con¬ 


trast  with  the  n°^2^  gates  that  would  be  required  by 
a  naive  construction.  The  total  size  of  our  circuit  for 
the  Schur  transform  is  thus  n  •  poly(d,  logn,  log  1/e). 

The  remainder  of  the  paper  is  as  follows.  In  Sec¬ 
tion  2  we  define  the  Schur  transform  along  with  the 
necessary  basic  concepts  from  representation  theory. 
In  Section  3  we  introduce  the  basis  labelling  scheme 
used  in  the  Schur  transformation  using  the  concept  of 
a  subgroup-adapted  basis.  Once  we  have  a  concrete 
Schur  basis  defined,  we  describe  the  Clebsch-Gordan 
transform  and  explain  how  to  use  it  to  give  an  effi¬ 
cient  circuit  for  the  Schur  transform  in  Sec.  4.  De¬ 
tails  on  efficiently  implementing  the  Clebsch-Gordan 
transform  are  in  an  appendix. 

2  Representation  theory  and  the  Schur 

transform 

Schur  duality  relates  to  the  representation  theory  of 
the  symmetric  group  on  n  elements,  Sn,  and  the 
group  of  dxd  unitary  matrices,  lAd-  In  this  section  we 
will  state  facts  about  these  representations  without 
proof;  for  more  details  the  reader  should  consult  [31] 
or  the  longer  version  of  this  paper  ([30]  and  future 
work) . 

2.1  Representation  theory:  A  representation 
(r,  V)  of  a  group  G  is  a  complex  vector  space  V  to¬ 
gether  with  a  homomorphism  from  G  to  End(R),  i.e. 
a  function  r  :  G  — >  End(E)  such  that  r(</i)r(<?2)  = 
r(gi92)-  We  say  a  representation  (r,  V)  is  irreducible 
(an  irrep)  if  the  only  r-invariant  subspaces  of  V  are 
the  empty  subspace  {0}  and  the  entire  space  V.  In 
order  to  apply  our  results  to  quantum  computing,  we 
consider  only  the  case  when  V  is  complex  and  finite 
dimensional  and  v{g)  is  unitary  for  all  g  £  G.  This 
way  a  unit  vector  in  V  can  represent  the  state  of  a 
quantum  system  and  group  elements  g  £  G  corre¬ 
spond  to  unitary  rotations  v(g),  which  could  in  prin¬ 
ciple  be  performed  by  a  quantum  computer. 

We  now  turn  to  the  representations  relevant  to 
the  Schur  transform.  Consider  a  system  of  n  d- 
dimensional  quantum  systems:  n  qudits.  Fix  a 
standard  basis  |i),  i  =  1 . . .  d  for  the  state  space  of 
each  qudit:  Cd.  A  basis  for  (Cd)®n  (which  we  call  the 
computational  basis)  is  then  l^}  (g>  | i2)  ®  ■  ■  •  <S>  \in)  = 
|*i,  i2,  •  •  • ,  in)  where  ik  =  1  •  ■ .  d.  In  terms  of  this 
basis,  we  can  define  the  action  of  Sn  as 

P(s)|*l)*2>***)*ro)  Ks-1(lp^s_1(2)>***5  ^s-1(n)) 

for  s  £  Sn.  The  unitary  group  Ud,  on  the  other  hand, 
acts  on  (Cd)®n  according  to  the  n-fold  product  action 
as 

Q(l0|*i,*2,-  •  -,*„)  =  U\ii)  ®  U\i2)  ®  ■  ■  ■  <8>  U\in) 
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for  any  U  €  Ud- 

Note  that  P  and  Q  commute,  which  means  they 
can  be  simultaneously  decomposed  into  irreps.  Schur 
duality  (or  Schur- Weyl  duality)  [31,  32]  goes  farther 
and  describes  the  exact  nature  of  this  decomposition, 
but  in  order  to  state  it,  we  will  first  need  to  specify 
the  irreps  of  Sn  and  Ud- 

Let  Id,n  =  {A  =  (Ai,  A2, . . . ,  Ad)|Ai  >  A2  > 

•  •  •  >  Ad  >  0  and  JA=1  A,;  =  n}  denote  partitions 
of  n  into  <  d  parts.  We  consider  two  partitions 
(Ai,...,Ad)  and  (Ai, . . . ,  A<j,  0, . . . ,  0)  equivalent  if 
they  differ  only  by  trailing  zeroes;  according  to  this 
principle,  Xn  :=  Xnn  contains  all  the  partitions  of 
n.  Partitions  label  irreps  of  Sn  and  Ud  as  follows: 
if  we  let  d  vary,  then  Xd,n  labels  irreps  of  Sn  (or 
sometimes  we  use  In  :=  In<n),  and  if  we  let  n  vary, 
then  Xd,n  labels  polynomial  irreps  of  Ud  (or  sometimes 
we  use  :=  U„X d,n)-  Call  these  irreps  (p\,V\) 
and  (q \,Q\)  respectively,  for  A  €  Id,n-  We  need  the 
superscript  d  because  the  same  partition  A  can  label 
different  irreps  for  different  Ud ;  on  the  other  hand  the 
5„-irrep  V\  is  uniquely  labeled  by  A  since  n  =  JT  A *. 

For  the  case  of  n  qudits,  Schur  duality  states  that 
there  exists  a  basis  (which  we  label  | A) |<7a) l7?A)sch  and 
call  the  Schur  basis)  which  simultaneously  decom¬ 
poses  the  action  of  P(s)  and  Q (U)  into  irreps: 

Q(t/)|A)|gA)|pA}Sch  =  \X)(qdx(U)\ qx))\px)Sch 

p(s)|A)|gA)|pA)Sch  =  |A)|gA)(pA(s)|pA))Sch 

and  that  the  common  representation  space  (Cd)®n 
decomposes  as 

(2.1)  ( Cd)®n  £*  0  Qdx®Tx- 

The  Schur  basis  can  be  expressed  as  superposi¬ 
tions  over  the  standard  computational  basis  states 

|*i  1  *2 )  •  ■  •  >  *n)  ns 

(2.2) 

E-  .  -|^,9A,PA 

Ugch  1*1*2  ■■■in), 

.  .  L  Jii,i2,...,in 

*1,12 

where  Ugch  is  the  unitary  transformation  implement¬ 
ing  the  isomorphism  in  (2.1).  Thus,  for  any  U  €  Ud 
and  any  s  £  Sn, 

(2.3) 

USchQ(C/)P(S)USch  =  \m®<lt(U)®Px(s). 

Aeid,n 

If  we  now  think  of  Ugch  as  a  quantum  circuit,  it 
will  map  the  Schur  basis  state  |  A,  qx,Px)gch  i-0 
computational  basis  state  A,  cjx-'Px)  with  A,  qx,  and 
px  expressed  as  bit  strings.  The  dimensions  of  the 


irreps  pA  and  vary  with  A,  so  we  will  need  to 
pad  the  |gA,pA)  registers  when  they  are  expressed 
as  bit  strings.  We  will  label  the  padded  basis  as 
|  A)  |  q)  |p),  explicitly  dropping  the  A  dependence.  Later 
in  the  paper  we  will  show  how  to  do  this  padding 
efficiently  with  only  a  logarithmic  additive  spatial 
overhead.  We  will  refer  to  the  transform  from  the 
computational  basis  |ii,  *2, . . . ,  in)  to  the  basis  of 
three  strings  |A)|g)|p)  as  the  Schur  transform.  The 
Schur  transform  is  shown  schematically  in  Fig.  1. 
Notice  that  just  as  the  standard  computational  basis 
|i)  is  arbitrary  up  to  a  unitary  transform,  the  bases 
for  and  Vx  are  also  both  arbitrary  up  to  a  unitary 
transform,  though  we  will  later  choose  particular 
bases  for  Qf  and  Vx- 


Figure  1:  The  Schur  transform.  Notice  how  the  direct 
sum  over  A  in  (2.1)  becomes  a  tensor  product  between 
the  | A)  register  and  the  |g)  and  | p)  registers.  Since 
the  number  of  qubits  needed  for  |  q)  and  \p)  vary  with 
A,  we  need  slightly  more  spatial  resources,  which  are 
here  denoted  by  the  ancilla  input  |0). 

2.2  Applications  of  the  Schur  Transform  The 

Schur  transform  is  useful  in  a  surprisingly  large 
number  of  quantum  information  protocols.  Here  we, 
review  these  applications,  with  particular  attention 
to  the  use  of  the  Schur  transform  circuit  in  each 
protocol.  We  emphasize  again  that  our  construction 
of  the  Schur  transform  simultaneously  makes  all  of 
these  tasks  computationally  efficient. 

2.2.1  Spectrum  and  state  estimation  Suppose 
we  are  given  many  copies  of  an  unknown  mixed  quan¬ 
tum  state,  p®"  and  wish  to  estimate  the  spectrum  of 
p.  An  asymptotically  optimal  estimate  (in  the  sense 
of  the  error  exponent  of  large  deviations)  for  the  spec¬ 
trum  of  p  can  be  obtained  by  applying  the  Schur 
transform,  measuring  A  and  taking  the  spectrum  es¬ 
timate  to  be  (Ai/n, . . . ,  Ad/n)[13,  24].  Thus  an  ef¬ 
ficient  implementation  of  the  Schur  transform  will 
efficiently  implement  the  spectrum  estimating  pro- 
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tocol  (note  that  it  is  efficient  in  d,  not  in  log(d)). 
Estimating  p  reduces  to  measuring  |A)  and  |  q),  al¬ 
though  an  optimal  estimator  is  known  only  for  the 
case  of  d  =  2  [33].  Further,  optimal  quantum  hypoth¬ 
esis  testing  can  be  obtained  by  a  similar,  but  more 
complicated,  protocol[17,  34],  The  only  one  of  these 
known  to  have  an  implementation  not  based  on  the 
Schur  transform  is  spectrum  estimation,  which  can 
be  performed  using  the  Fourier  transform  on  Sn  with 
a  technique  known  as  “generalized  phase  estimation” 
or  as  “the  phase  kickback  trick”  [35] . 

2.2.2  Universal  distortion- free  entanglement 
concentration  Let  \iP)ab  be  a  bipartite  partially 
entangled  state  shared  between  two  parties,  A  and  B. 
Suppose  we  are  given  many  copies  of  |  if))  ab  and  we 
want  to  transform  these  states  into  copies  of  a  maxi¬ 
mally  entangled  state  using  only  local  operations  and 
classical  communication.  Further,  suppose  that  we 
wish  this  protocol  to  be  universal,  meaning  it  works 
when  neither  A  nor  B  know  the  state  \iP)ab,  un¬ 
like  the  original  entanglement  concentration  protocol 
in  [10].  Universal  distortion-free  (meaning  zero  er¬ 
ror,  but  the  entanglement  yield  is  a  random  variable) 
entanglement  concentration  can  be  performed  [14]  by 
both  parties  performing  Schur  transforms  on  their  n 
halves  of  |  i/j)ab,  measuring  their  |A),  discarding  |g) 
and  retaining  \p).  The  two  parties  will  now  share  a 
maximally  entangled  state  of  varying  dimension  de¬ 
pending  on  which  A  was  measured.  This  dimension  is 
2n(H±o(i))  wjth  probability  1  —  o(l),  where  H  is  the 
entropy  of  one  of  the  parties’  reduced  mixed  states, 
and  in  fact  this  protocol  is  optimal  even  in  a  non- 
asymptotic  sense  (i.e.  for  each  finite  value  of  n)[23]. 
While  universal  entanglement  concentration  can  also 
be  efficiently  achieved  without  the  Schur  transform 
by  using  some  of  the  copies  to  perform  tomography, 
this  introduces  D(n-1/2)  errors. 

2.2.3  Universal  Compression  with  Optimal 
Overflow  Exponent  Measuring  |A)  weakly  so  as  to 
cause  little  disturbance,  together  with  appropriate  re¬ 
labeling,  comprises  a  universal  compression  algorithm 
with  optimal  overflow  exponent  (rate  of  decrease  of 
the  probability  that  the  algorithm  will  output  a  state 
that  is  much  too  large)  [15,  16].  In  the  fixed-rate  set¬ 
ting  (where  the  entropy  of  the  source  is  promised  to 
be  less  than  the  rate),  performing  a  projective  mea¬ 
surement  on  A  will  compress  while  incurring  the  op¬ 
timal  exp(— O(n))  error  rate[36].  The  only  known  ef¬ 
ficient  procedures  not  relying  on  the  Schur  transform 
have  worse  overflow  exponents[37,  38]  and  introduce 
fl(n-1/4)  errors  even  in  the  fixed-rate  setting. 


2.2.4  Encoding  and  decoding  into 
decoherence-free  subsystems  Further  appli¬ 
cations  of  the  Schur  transform  include  encoding 
into  decoherence-free  subsystenrs[18,  19,  20,  21]. 
Decoherence-free  subsystems  are  subspaces  of  a 
system’s  Hilbert  space  which  are  immune  to  decoher¬ 
ence  due  to  a  symmetry  of  the  system-environment 
interaction.  For  the  case  where  the  environment 
couples  identically  to  all  systems,  information  can 
be  protected  from  decoherence  by  encoding  into  the 
|pa)  basis.  We  can  use  the  inverse  Schur  transform 
(which,  as  a  circuit  can  be  implemented  by  reversing 
the  order  of  all  gate  elements  and  replacing  them 
with  their  inverses)  to  perform  this  encoding:  simply 
feed  in  the  appropriate  |A)  with  the  state  to  be 
encoded  into  the  | p)  register  and  any  state  into 
the  |  q)  register  into  the  inverse  Schur  transform. 
Decoding  can  similarly  be  performed  using  the  Schur 
transform.  Previously  no  efficient  algorithms  for 
encoding  or  decoding  were  known. 

2.2.5  Communication  without  a  shared  ref¬ 

erence  frame  An  application  of  the  concepts  of 
decoherence-free  subsystems  comes  about  when  two 
parties  wish  to  communicate  (in  either  a  classical  or 
quantum  manner)  when  the  parties  do  not  share  a 
reference  frame.  The  effect  of  not  sharing  a  reference 
frame  is  the  same  as  the  effect  of  collective  decoher¬ 
ence  (the  same  random  unitary  rotation  has  been  ap¬ 
plied  to  each  subsystem).  Thus  encoding  information 
into  the  | p)  register  will  allow  n  —  O(logn)  qudits  to 
be  sent  noiselessly  with  n  uses  of  the  channel  in  spite 
of  the  fact  that  the  two  parties  do  not  share  a  refer¬ 
ence  frame  [22].  Just  as  with  decoherence-free  subsys¬ 
tems,  this  encoding  and  decoding  can  be  done  with 
the  Schur  transform.  Previously,  the  best  known  effi¬ 
cient  procedure  used  m  out  of  the  n  channel  uses  for 
tomography,  resulting  in  overall  error. 

3  Subgroup  adapted  bases  and  the  Schur 
basis 

In  the  last  section,  we  defined  the  Schur  transform 
in  a  way  that  left  the  basis  almost  completely  arbi¬ 
trary.  To  construct  a  quantum  circuit  for  the  Schur 
transform,  we  need  to  explicitly  specify  the  Schur  ba¬ 
sis.  Since  we  want  the  Schur  basis  to  be  of  the  form 
|  A,  q,p),  this  reduces  to  specifying  orthonormal  bases 
for  and  V\,  which  we  will  call  Qf  and  P\,  respec¬ 
tively.  We  will  choose  Qf  and  P\  to  both  be  a  type 
of  basis  known  as  a  subgroup-adapted  basis,  an  idea 
first  introduced  to  quantum  information  in  [28,  29]. 

The  key  idea  is  to  examine  how  an  irrep  (r,  V)  of 
a  group  Q  decomposes  into  FUirreps  when  restricted 
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to  T~L  (denoted  (r|^,  UJ.W)).  This  behavior  is  called 
branching  and  when  no  irrep  of  H  ever  appears  more 
than  once  in  an  irrep  of  Q,  we  call  the  branching 
multiplicity-free.  Now  consider  a  tower  of  groups 
Q  =  Gi  D  02  D  ■"  3  Gk- 1  3  Gk  =  {e}  where 
{e}  is  the  trivial  subgroup  consisting  of  only  the 
identity  element.  We  call  this  a  canonical  tower  when 
the  branching  for  each  Gi+i  C  Gi  is  multiplicity- 
free.  Finally  we  can  define  a  subgroup-adapted,  basis 
(unique  up  to  an  arbitrary  choice  of  phase)  in  which 
basis  vectors  have  the  form  \otk,  ■  ■  ■ ,  a2),  where  each 
a.t  G  Qi  and  ai+ 1  appears  in  the  decomposition  of 

t'c.W 

We  now  need  only  to  specify  canonical  towers 
of  subgroups  for  Ud  and  Sn,  which  will  give  rise 
to  subgroup-adapted  bases  for  the  irreps  Qf  and 

known  as  the  Gel’fand-Zetlin  basis[26]  and  the 
Young-Yamanouchi  basis  (or  sometimes  Young’s  or¬ 
thogonal  basis[27]),  respectively. 

The  Gel’fand-Zetlin  basis  for  Qd —  For  Ud,  it 
turns  out  that  the  chain  of  subgroups  {1}  =  Uq  C 
Ml  C  ...  C  Ud- 1  C  Ud  is  a  canonical  tower,  where 
we  have  Ud  —  1  embedded  in  Ud  by  Ud- i  :=  {u  £ 
Ud  :  u\d )  =  |d)}.  Since  the  branching  from  Ud 
to  Ud- i  is  multiplicity-free,  we  obtain  a  subgroup- 
adapted  basis  Qf,  which  is  known  as  the  Gelfand- 
Zetlin  (GZ)  basis.  Our  only  free  choice  in  a  GZ  basis 
is  the  initial  choice  of  basis  1 1) , . . . ,  \  d)  for  Cd  which 
determines  the  canonical  tower  of  subgroups  U\  C 
...  C  Ud-  Once  we  have  chosen  this  basis,  specifying 
Qd  reduces  to  knowing  which  irreps  Qd~l  appear  in 
the  decomposition  of  Q\iud-i-  Recab  that  the  irreps 
of  Ud  are  labeled  by  elements  of  with  n  arbitrary. 
This  set  can  be  denoted  by  Z++  :=  U nZd,n  =  {A  G 
:  Ai  >  ...  >  Ad  >  0}.  For  p  £  Z^.A  G  Zd+, 
we  say  that  p  interlaces  A  and  write  p^X  whenever 
Ai  >  /Lti  >  A2  . . .  >  Ad-i  >  /id- 1  >  Ad-  In  terms  of 
Young  diagrams,  this  means  that  p  is  a  valid  partition 
(i.e.  a  nonnegative,  nonincreasing  sequence)  obtained 
from  removing  zero  or  one  boxes  from  each  column 
of  A.  Thus  a  basis  vector  in  Qd  corresponds  to  a 
sequence  of  partitions  q  =  ( qd  =  A, . . . ,  qi)  such  that 
^ qd  and  qj  G  Z\+  for  j  =  1, . . . ,  d. 

In  order  to  work  with  the  Gel’fand-Zetlin  basis 
vectors  on  a  quantum  computer,  we  will  need  an  ef¬ 
ficient  method  to  write  them  down.  If  d  is  small 
compared  to  n  (as  in  many  information  theory  ap¬ 
plications),  we  can  write  an  element  of  Id,n  with 
[dlog(n  +  1)]  bits,  since  it  consists  of  d  integers  be¬ 
tween  0  and  n.  A  Gel’fand-Zetlin  basis  vector  then 
requires  no  more  than  \d2  log(n+l)]  bits,  since  it  can 
be  expressed  as  d  partitions  of  integers  no  greater 
than  n  into  <  d  parts.  Unless  otherwise  specified, 


our  algorithms  will  use  this  encoding  of  the  GZ  basis 
vectors.  However,  another  encoding,  known  as  semi¬ 
standard  Young  tableaux,  can  represent  a  GZ  basis 
vector  using  n[log  d]  bits.  To  encode  q  =  (qd,  ■  ■  ■ ,  qi), 
we  fill  the  Young  diagram  of  A  with  n  integers  from 
{1, . . . ,  d}  in  a  pattern  that  is  nonincreasing  from  left 
to  right,  strictly  increasing  from  top  to  bottom,  and 
such  that  for  each  d'  <  d,  removing  all  boxes  with 
integers  larger  than  d!  leaves  the  Young  diagram  of 

qd'- 

The  Young-  Yamanouchi  basis  for  V\  —  The  situ¬ 
ation  for  Sn  is  quite  similar.  Our  chain  of  subgroups 
is  {e}  =  <Si  C  S2  C  . . .  C  Sn,  where  for  m  <  n  we 
define  Sm  C  Sn  to  be  the  permutations  in  Sn  which 
leave  the  last  n—m  elements  fixed.  Recall  that  the  ir¬ 
reps  of  Sn  can  be  labeled  by  In  =  Tn,n:  the  partitions 
of  n  into  <  n  parts. 

Again,  the  branching  from  Sn  to  is 

multiplicity-free,  so  to  determine  an  orthonormal  ba¬ 
sis  P\  for  the  space  V\  we  need  only  know  which  ir¬ 
reps  occur  in  the  decomposition  of  V\\ [5  .  It  turns 

out  that  the  branching  rule  is  given  by  finding  all 
ways  to  remove  one  box  from  A  while  leaving  a  valid 
partition.  Denote  the  set  of  such  partitions  A  — 
Formally,  A  —  □  :=  Tn  Pi  {A  —  ej  :  j  =  1, . . .  ,n}, 
where  ej  is  the  unit  vector  in  Z"  with  a  one  in  the 
jth  position  and  zeroes  elsewhere.  Thus,  the  general 
branching  rule  is 

(3.4)  ^  0  VX>. 

A'eA-D 


This  chain  can  be  concisely  labelled  in  [log  rd] 
bits  by  writing  the  number  j  in  the  box  of  the 
Young  frame  that  is  removed  when  restricting  from 
Sj  to  Sj- 1.  However,  for  applications  such  as 
data  compression [15,  16]  we  will  need  an  encoding 
which  gives  us  closer  to  the  optimal  [logP\]  bits. 
First  we  note  an  exact  (and  efficiently  computable) 
expression [31,  27]  for  \P\\  =  dim^A: 


(3.5)  dim  "Pa 


rii<i<j<d(^i  Ai  +  3  *)  | 

Ai  +  d  —  l!A2  +  rl  —  2!  •••  A^  ^  ’ 


Now  we  would  like  to  efficiently  and  reversibly  map 
an  element  of  P\  (thought  of  as  a  chain  of  partitions 
P  =  (pn  =  A, . . .  ,P1  =  (1))  G  P\,  with  Pj  G  Pj+I  -  □) 
to  an  integer  in  [|Pa|]  :=  {1, . . . ,  |Pa|}-  We  will 
construct  this  bijection  fn  :  P\  [|P\|]  by  defining 
an  ordering  on  P\  and  setting  fn(p)  :=  \{p'  G 
P\  :  p'  <  p}\.  First  fix  an  arbitrary,  but  easily 
computable,  (total)  ordering  on  partitions  in  In  for 
each  n;  for  example,  lexicographical  order.  This 
induces  an  ordering  on  P\  if  we  rank  a  basis  vector 
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p  £  P\  first  according  to  pn-\,  using  the  order  on 
partitions  we  have  chosen,  then  according  to  pn- 2 
and  so  on.  We  skip  pn ,  since  it  is  always  equal  to  A. 
In  other  words,  for  p,p'  £  Px,  p  >  p'  if  pn-i  >  p'n_i 
or  Pn-!  =  Pn-!  and  p„_2  >  p'n_2  or  pn- 1  =  p'n_x, 
Pn—2  =  Pn- 2  and  Pn- 3  >  p'n_3 ,  and  so  on.  Thus 
fn'-P\—>  [|Pa|]  can  be  easily  verified  to  be 
(3.6) 

n 

fnip)  =  fn(Pl,---  ,Pn)  ■=  1  +  £  £  dim^- 

k—2  Aepfc-D 
A<pfc_i 


“remove  a  box”  Sn  Z>  Sn-  ±  branching  rule  stated  in 
Sec.  3. 

We  now  seek  to  define  the  CG  transform  as  a 
quantum  circuit.  One  of  the  input  irreps  will  always 
be  the  defining  irrep,  but  we  allow  the  other  irrep  to 
be  specified  by  a  quantum  input.  If  we  define  Uqq 
to  be  the  transform  relating  the  two  sides  of  (4.7), 
then  the  CG  transform  we  are  interested  in  is 

(4.8)  UCG=  £  |A)(A|  (g>  Uqq). 

Aez*  + 


Thus  fn  is  an  injective  map  from  P\  to  [|Pa|]  that  is 
computable  in  time  polynomial  in  n.  Unfortunately, 
similar  techniques  for  Q d  take  time  f l(nd). 

4  The  Clebsch-Gordan  transform  and 
efficient  circuits  for  the  Schur  transform 

In  this  section,  we  describe  an  efficient  circuit  for  the 
Schur  transform  Usch-  The  key  building  block  will 
be  the  Ud  Clebsch-Gordan  (CG)  transform,  which  de¬ 
composes  a  Kronecker  product  of  Wd-irreps  Qd  (g>  Qd 
into  a  direct  sum  of  other  Ud~ irreps  (described  in 
Sec.  4.1).  Then  we  will  give  efficient  recursive  con¬ 
structions  for  the  Schur  transform  and  the  CG  trans¬ 
form.  In  Sec.  4.2,  we  will  show  how  the  Schur  trans¬ 
form  on  (Cd)®n  reduces  to  a  Schur  transform  on 
(Cd)®n-i  and  a  CG  transform.  Then  in  Sec.  4.3,  we 
will  show  how  the  Ud  CG  transform  reduces  to  a  Ud- 1 
CG  transform  and  a  efficiently-calculable  d  x  d  uni¬ 
tary  matrix  known  as  a  reduced  Wigner  transform. 
Together  this  will  yield  poly-tinre  algorithms  for  the 
CG  and  Schur  transforms. 

4.1  The  Clebsch-Gordan  Series  and  Trans¬ 
form  The  Clebsch-Gordan  decomposition  describes 
the  reduction  of  a  tensor  product  representation  into 
irreps.  We  specialize  to  the  case  when  the  group  is  Ud, 
one  of  the  irreps  is  and  the  other  is  Qdx^  =  Cd,  the 
d-dimensional  defining  irrep  of  Ud-  The  representa¬ 
tion  Qd  Cg)  Qd^  is  generally  reducible  and  decomposes 
as 

(4.7)  2a  ®  Qd(x)  =  0  Qy, 

A'eA+n 

Here  A  +  □  =  {A  +  ej  :  j  £  [d]}  D  is  the  “add 
a  single  box”  prescription  for  tensoring  in  a  defining 
representation  of  Ud-  we  add  a  single  box  to  a  Young 
diagram  and  if  the  new  Young  diagram  is  a  valid 
Young  diagram  (i.e.  corresponds  to  a  valid  partition), 
then  this  irrep  appears  in  the  Clebsch-Gordan  series. 
By  Schur  duality,  this  statement  is  equivalent  to  the 


This  takes  as  input  a  state  of  the  form  |A)|g)|i),  for 
A  €  Z++,  \q)  €  Qd  and  i  £  [d].  The  output  is 
a  superposition  over  vectors  |A)|A')|</),  where  A'  £ 
A  +  □  and  |  q')  £  Qdx, .  Equivalently,  we  could  output 
|A)|jM  or  \j}\X'}\q')  (where  A'  =  A+e^)  since  (A,  A'), 
(A,  j)  and  (A ' ,j)  are  all  trivially  related  via  reversible 
classical  circuits. 


|A) 

|V) 

Id) 


Figure  2:  Schematic  of  the  Clebsch-Gordan  trans¬ 
form.  Equivalently,  we  could  replace  either  the  A 
output  or  the  A'  output  with  j  such  that  A'  =  A  +  . 


4.2  Constructing  the  Schur  Transform  from 
Clebsch-Gordan  Transforms  We  now  describe 
how  to  construct  the  Schur  transform  out  of  a  series 
of  Clebsch-Gordan  transforms.  Begin  by  decompos¬ 
ing  (Cd)®n  in  two  different  ways.  First,  we  Schur- 
decompose  the  first  n  —  1  qudits, 

Ud XSn—i  _ 

(4.9)  (Cd)0n-1<g>Cd  “  0  Qdx®V A®Cd 

^^d,n-  1 

Next,  combine  Qf  and  Cd  using  the  CG  transform 
(4.7),  then  rearrange  terms  to  obtain 

(4.10) 

(Cr=0Sl-®PA=0Sl-®  0  Ua). 

A£X(£)T1,_  i  yAGA'— □  J 

a'ga+d 

On  the  other  hand,  we  have  (<Cd)®”  =  ©A'eid 

V\>  from  (2.1).  These  two  decompositions  can  be 


1240 


equated  using  (3.4),  the  branching  rule  for  Sn- 1  C 

Sn  ■ 

Now  we  will  turn  the  representation-theoretic 
arguments  of  the  last  paragraph  into  an  algorithm. 
The  Schur  transform  on  (Cd)®"  starts  with  inputs  of 
the  form  \ii,. . .  ,in)  £  (Cd)®"  and  is  implemented  as 
follows: 

1.  Perform  the  Schur  transform  on  the  first  n  — 
1  registers  (corresponding  to  (4.9))  to  obtain 
jA(n-i))|9(n-i)>|pCn-i))|jn). 

2.  Perform  the  CG  transform  (as  in 
(4.10))  on  |A("-1))|(?(«-1))|i„)  to  obtain 

|A(n-i))|A(n))|5(n))_ 

3.  Set  |A)  =  |A(n))  and  |g)  =  \q (")).  Concatenate 

An_1  and  p(n~1'1  to  form  the  Young-Yamanouchi 
basis  element  | p)  =  |  €  V\. 

The  base  case  of  this  recursion  is  simply  the  trivial 
n  =  1  relabelling  corresponding  to  Qk-,  =  Cd  and 
^(i)=C. 

We  can  also  express  this  algorithm  for  the  Schur 
transform  without  the  need  for  recursion.  On  in¬ 
put  \ii,...,in)  £  (Cd)®n,  we  combine  each  of 

|*i),  . . . ,  | i„)  using  the  CG  transform,  one  at  a  time. 
We  start  by  inputting  |A^)  =  | (1)) ,  |*i)  and  \i2) 
into  UCG  which  outputs  |A^)  and  a  superposition 
of  different  values  of  |A^2^)  and  \q2)  .  Here  A^2)  can 
be  either  (2,0)  or  (1,1)  and  \q2)  £  Qy 2).  Contin¬ 
uing,  we  apply  Ucg  to  | A^2-1  ]>  | (72)  K3) ,  and  output  a 
superposition  of  vectors  of  the  form  |A^2^)|A^3^)|g3), 
with  A^3)  £  Id, 3  and  \q2)  £  Q((3).  Each  time  we 
are  combining  an  arbitrary  irrep  A*-fc-*  and  an  associ¬ 
ated  basis  vector  | qu)  £  Qyk),  together  with  a  vector 
from  the  defining  irrep  |*fc+i).  This  is  repeated  for 
k  =  1, . . . ,  n  —  1  and  the  resulting  circuit  is  depicted 
in  Fig.  3. 

We  are  left  with  a  superposition  of  states  of 
the  form  |A(1), . . . ,  A(n))|g„),  where  \qn)  £  Qdyny 
A £  Id,k  and  each  A ^  is  obtained  by  adding  a 
single  box  to  A^fc_1^;  i.e.  A ^  =  A for 
some  jk  £  [d].  Again  we  relabel  A  =  ,  |g)  =  |q„) 

and  | p)  =  |AW, . . . ,  A^"-1)),  where  we  use  the  fact 
that  our  basis  for  V\  is  adapted  to  the  subgroup 
tower  Si  C  ...  C  Sn.  Thus,  we  obtain  the  desired 
|A)|q)|p).  Finally,  we  can  optionally  compress  the 
|p)  register  to  [log  |P>|]  qubits  using  the  techniques 
in  Sec.  3,  as  is  required  for  applications  such  as 
data  compression  and  entanglement  concentration, 
described  in  Sec.  2.2. 

In  the  next  section  we  will  show  that  a  single 
Ud  transform  can  be  performed  to  accuracy  e  in 


|A«) 

|aW) 

|A<3>) 


Ucg  —  |AW) 

- \Qn) 


Figure  3:  Cascading  Clebsch-Gordan  transforms  to 
produce  the  Schur  transform.  Not  shown  are  any 
ancilla  inputs  to  the  Clebsch-Gordan  transforms. 
The  structure  of  inputs  and  outputs  of  the  Clebsch- 
Gordan  transforms  are  the  same  as  in  Fig.  2. 


time  poly  (log  ?r,  d,  log  1/e).  Thus  the  entire  Schur 
transform  requires  time  n  •  poly  (log  n,  d,  log  1/e)  plus 
an  optional  poly(?z)  to  compress  the  |p)  register. 

Remark:  H  d  >  n,  then  a  slight  modifi¬ 
cation  of  the  above  algorithm  can  run  in  time 
poly(n,  logd,  log  1/e).  First  note  that  given  oracle  ac¬ 
cess  to  u  £  Sd  C  Ud,  we  need  only  time  O(nlogd)  to 
apply  Q(u )  or  q^(u),  assuming  our  GZ  basis  is  writ¬ 
ten  as  semistandard  Young  tableaux.  The  algorithm 
first  calculates  a  sorted  list  ai, ...  ,am  of  the  m  <  n 
distinct  symbols  occuring  in  *i,  •  •  • ,  in-  Next  we  will 
map  | aj)  to  \j)  for  each  occurence  of  aj  in  the  in¬ 
put  string.  Now  we  can  apply  the  Schur  transform  to 
(Cm)®n,  which  runs  in  time  poly (n,  log  1/e).  Finally 
we  apply  the  inverse  map  | j)  — >  | aj)  to  |g),  and  use 
| q)  to  uncompute  a±, ... ,  am. 

4.3  Efficient  circuits  for  the  Clebsch-Gordan 
transform  We  now  describe  how  to  construct  the 
Ud  CG  transform  described  in  (4.7)  and  (4.8).  The 
key  to  an  efficient  algorithm  will  be  to  reduce  the 
Ud  CG  transform  to  a  Ud- 1  transform.  As  with  the 
Schur  transform,  our  strategy  will  be  to  decompose  a 
representation  in  two  different  ways,  and  then  to  find 
an  operational  method  of  equating  them.  This  time 
we  will  decompose  the  Zdd-representation  Qd  (g)  Cd, 
which  is  input  to  the  CG  transform,  both  by  using 
the  Ud- 1  C  Ud  branching  rules  and  by  applying  the 
CG  transform. 

First,  we  might  apply  the  Ud  CG  transform  and 
then  Ud- 1  C  Ud  branching  to  obtain 

(4.11)  0  Qd,  0  Qd7\ 

A/£A-|-IIH  A/£A-|-IIII 

/x'^A' 
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Here  we  can  write  A'  =  A  +  ej  for  j  £  [d],  and  can 
map  between  |A,  A')  and  |A,j). 

On  the  other  hand,  if  we  restrict  to  Ud-\  first  and 
then  apply  the  Ud-i  CG,  we  obtain 

(4.12a) 

X 

(4.12b)  ^  0  Qd~x  ©  Qd~x  ©  C^1 

X 

(4.12c)  "^0  Qt1®  0  Qy 

(4.i2d)  uy  0  ay. 

x 

y  g{m}u^+d 

In  this  last  step,  we  can  write  p!  =  p  +  ej>,  where 
/  €  {0, 1, . . . ,  d  —  1}  and  we  have  defined  eo  =  0; 
moreover,  we  can  readily  map  /x/)  to  and  from 

WJ'b 

We  want  to  perform  XJq’q\  which  is  a  Ud- 
invariant  operator  that  maps  the  LHS  of  (4.12a)  to 
the  LHS  of  (4.11).  Since  all  the  maps  in  (4.12) 
and  (4.11)  are  70_i-invariant,  if  we  pass 
through  each  of  these  isomorphisms  we  obtain  a  Ud-\- 
invariant  map  from  (4.1 2d)  to  the  RHS  of  (4.11), 
which  we  call  Tx.  Next,  the  fact  that  Tx  commutes 
with  the  action  of  Ud-i  means  that  Tx  must  act  as 
the  identity  on  each  subsystem  <2^7 1  and  nontrivially 
only  on  the  multiplicity  spaces  of  Q^b1,  conditioned 
on  ft' .  These  multiplicity  spaces  are  d-dimensional 
in  both  (4.12d)  and  (4.11);  in  the  former  they  are 
labeled  by  j'  £  {0, . . . ,  d  —  1}  such  that  p!  =  p  +  ej>, 
while  in  the  latter  they  are  labeled  by  j  £  [d]  such 
that  A'  =  A  +  Cj . 

Let  TA,M  denote  the  restriction  of  Tx  to  the 
multiplicity  space  of  Q^b1.  With  a  slight  abuse  of 
notation,  we  can  write 

(4.13)  Tx  =  ]T  ©  IQ*7 1  ©  TA’W 

We  call  Tx,fJ’  a  reduced  Wigner  operator.  Its  matrix 
elements,  given  by 

d  d—  1 

(4.14)  TV=EE^i'l^'  I’ 

j= 1  j'= o 

are  called  reduced  Wigner  coefficients.  At  the  end  of 
this  section,  we  will  show  how  the  T*p  can  efficiently 


calculated,  and  thus  Tx and  Tx  can  be  efficiently 
implemented. 

First  we  show  how  performing  Tx will  allow 
us  to  implement  ^ .  Since  we  constructed  Tx 
by  composing  Uqq1^  with  the  isomorphisms  in  (4.12) 
and  (4.11),  our  algorithm  for  Uqq1'*  will  simply 
be  to  apply  the  isomorphisms  in  (4.12),  apply  Tx 
(or  equivalently,  Tx conditioned  on  p'),  and  then 
reverse  the  isomorphism  in  (4.11).  A  more  detailed 
description  of  the  CG  algorithm  is  as  follows: 

1.  Start  with  input  |A)|g)|*)  with  A  £  Id.n,  Id)  €  Qf 
and  i  £  [d]. 

2.  Unpack  |g)  into  \p)\qd-i),  with  p^A  and  |dd-i)  €E 
Q'jffi1.  This  can  be  done  efficiently  because  |d)  is 
expressed  in  the  GZ  basis. 

3.  If  i  £  { 1, . . . ,  d  —  1}  then  perform  the  Ud~i 
CG  transform  on  \p)\qd~i)\i)  to  obtain  output 

W)Wd-i)\j’)  with  Wd- 1)  e  Q^1  and  = 

p  +  eji.  Otherwise  if  i  =  d  then  simply  set 
W)  =  I m),  Wd-i)  =  Wd- 1)  and  replace  |z)  =  |0) 
with  | j')  =  |0).  The  “if/then/else”  statement 
corresponds  to  (4.12a),  while  the  conditional 
lAd-i  CG  transform  is  the  isomorphism  applied 
in  (4.12c) 

4.  Perform  the  reduced  Wigner  transform  Tx,  by 
applying  TA,M  conditional  on  p'  to  map  | j')  to 
| j),  as  per  (4.13)  and  (4.14). 

5.  Map  |A)|j)  | A') | j)  with  A'  =  A  +  ej. 

6.  Pack  \p')  and  \q'd_i)  together  into  a  GZ  basis 
vector  | q')  £  Qd,,  as  in  (4.11). 

7.  Output  |A,)|d,)U)- 

Finally  we  describe  how  to  efficiently  implement 
Ta,m  ,  starting  with  an  efficiently-calculable  formula 
for  T*j?  from  Ref.  [46].  First  introduce  the  vectors 

A  :=  A+J2y(d-i)el  and  p  :=  ^+Z)f'=i(d_1_*,)e*' 
(where  we  recall  that  p  =  p!  —  ej').  Also  define  Sj-j' 
to  be  1  if  j  >  j'  and  —1  if  j  <  j' .  Then  according  to 
Eq.  (38)  in  Ref  [46], 

n  y  -  ^s)  n  (/v  -  +  i)i 2 
n  y  -  a.)  n  (/v — a*  + i) 


for  j'  £  {1, . . . ,  d  —  1},  while  for  j'  =  0  we  have 


1 
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The  components  of  the  partitions  here  are  O(n), 
so  that  each  T^jf  can  be  computed  in  time 
poly (d,  log  n).  We  can  also  perform  TA,M  (and 
thus  Tx)  to  accuracy  e  in  time  poly (d,  log  n,  log  1/e) 
by  (a)  computing  all  d2  matrix  elements  of  TA,Al 
up  to  precision  ei  ,  (b)  decomposing  this  matrix 
into  d2  poly  log(d)  elementary  one  and  two-qubit 
operations[41],  (c)  approximating  these  operations  to 
accuracy  £2  with  products  of  unitaries  drawn  from  a 
fixed  finite  set  (such  as  Clifford  operators  and  a  tt/8 
rotation)  [44,  45],  (d)  applying  these  gates  and  (e) 
uncomputing  all  of  the  garbage  bits  produced  by  the 
classical  computation  along  the  way.  By  appropriate 
choice  of  e±  and  £2  we  achieve  a  total  running  time  of 
poly(d,  log  n,  log  1/e). 

5  Conclusion 

We  have  given  an  algorithm  for  the  Schur  transform 
with  running  time  polynomial  in  the  dimension  d,  the 
number  of  qudits,  n,  and  the  accuracy,  log(l/e).  This 
makes  efficient  a  large  set  of  quantum  information 
protocols[13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23] 
whose  computational  efficiency  has,  prior  to  our 
work,  been  uncertain.  Moreover,  the  existence  of 
an  efficient  Schur  transform  raises  the  possibility  of 
new  quantum  algorithms  using  it  as  a  subroutine. 
Kuperberg’s[39]  subexponential  algorithm  for  the  di¬ 
hedral  hidden  subgroup  problem  makes  use  of  the 
effect  of  the  CG  transform  for  the  dihedral  group. 
Could  similar  techniques  for  Ud  be  useful?  So  far 
nonabelian  quantum  Fourier  transforms  [28,  29]  have 
not  had  as  many  applications  as  their  abelian  coun¬ 
terparts,  but  perhaps  the  Schur  transform  will  pro¬ 
vide  a  fresh  perspective. 

Our  techniques  (exploiting  Ud~Sn  duality  and 
the  structure  of  their  subgroup-adapted  bases)  could 
also  be  applied  to  other  pairs  of  groups,  generally 
known  as  dual  reductive  pairs.  Some  candidates  are 
discussed  in  [30,  Sect  5.4],  but  no  applications  of  these 
other  transforms  are  yet  known.  In  general,  one  can 
work  with  dual  reductive  pairs  by  studying  either  one 
of  the  component  groups.  This  paper  focussed  on 
Ud,  but  in  a  companion  paper  (to  appear;  see  also 
[30,  Chap  8])  we  will  explore  connections  between 
the  Sn  quantum  Fourier  transform  and  the  Schur 
transform. 
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Abstract.  Schur  duality  decomposes  many  copies  of  a  quantum  state 
into  subspaces  labeled  by  partitions,  a  decomposition  with  applications 
throughout  quantum  information  theory.  Here  we  consider  applying  Schur 
duality  to  the  problem  of  distinguishing  coset  states  in  the  standard  ap¬ 
proach  to  the  hidden  subgroup  problem.  We  observe  that  simply  measur¬ 
ing  the  partition  (a  procedure  we  call  weak  Schur  sampling )  provides  very 
little  information  about  the  hidden  subgroup.  Furthermore,  we  show  that 
under  quite  general  assumptions,  even  a  combination  of  weak  Fourier  sam¬ 
pling  and  weak  Schur  sampling  fails  to  identify  the  hidden  subgroup.  We 
also  prove  tight  bounds  on  how  many  coset  states  are  required  to  solve 
the  hidden  subgroup  problem  by  weak  Schur  sampling,  and  we  relate  this 
question  to  a  quantum  version  of  the  collision  problem. 


1  Introduction 

The  hidden  subgroup  problem  (hsp)  is  a  central  challenge  for  quantum  compu¬ 
tation.  On  the  one  hand,  many  of  the  known  fast  quantum  algorithms  are  based 
on  the  efficient  solution  of  the  abelian  HSP  [21,22,38,41],  On  the  other  hand, 
the  nonabelian  HSP  has  potential  applications:  in  particular,  the  graph  isomor¬ 
phism  problem  can  be  reduced  to  the  HSP  in  the  symmetric  group  [8, 14],  and 
the  shortest  lattice  vector  problem  can  be  reduced  to  a  variant  of  the  HSP  in  the 
dihedral  group  [36].  Unfortunately,  no  efficient  algorithms  are  known  for  these 
two  instances  of  the  nonabelian  HSP.  However,  some  partial  progress  has  been 
made:  there  is  a  subexponential  time  algorithm  for  the  dihedral  HSP  [31,37],  and 
it  is  known  how  to  solve  the  HSP  efficiently  for  a  variety  of  other  nonabelian 
groups  [2,16,17,19,25,28,33], 

In  the  HSP  for  a  group  G ,  we  have  black-box  access  to  a  function  /  :  G  — >  S, 
where  S  is  some  finite  set.  We  say  that  /  hides  a  subgroup  H  <  G  provided 
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f(g )  =  f(g')  iff  g~1gr  €  H.  The  goal  is  to  determine  H  (say,  in  terms  of  a 
generating  set)  as  quickly  as  possible.  In  particular,  we  say  that  an  algorithm 
for  the  HSP  in  G  is  efficient  if  it  runs  in  time  poly  (log  |Gj). 

Nearly  all  quantum  algorithms  for  the  HSP  use  the  so-called  standard  method , 
in  which  we  query  /  on  a  uniform  superposition  of  group  elements  and  then 
discard  the  function  value,  giving  a  coset  state  \gH )  :=  \H\  1 '2  J2heH  Ifi^)  f°r 
some  unknown,  uniformly  random  g  G  G.  This  state  is  described  by  the  density 
matrix 

\sH)(gH\  =  -, 4  £  Rih)  (1) 

'  '  geG  '  '  heH 

(called  a  hidden  subgroup  state),  where  R  is  the  right  regular  representation  of 
G,  satisfying  R(g)\g')  —  \ g'g-1)  for  all  g,  g'  G  G.  Now  the  HSP  is  reduced  to  the 
problem  of  distinguishing  the  states  pu  for  the  possible  H  <  G. 

The  symmetry  of  pu  can  be  exploited  using  Fourier  analysis.  In  particular,  the 
group  algebra  C G  decomposes  under  the  commuting  left  and  right  multiplication 
actions  of  G  as 

c G  %G  ®  K  ®  v;  (2) 

(tEG 

where  G  denotes  a  complete  set  of  irreducible  representations  (or  irreps )  of  G, 
and  Vcr  and  V*  are  the  (row  and  column,  respectively)  subspaces  acted  on  by 
a  G  G.  The  unitary  transformation  that  relates  the  standard  basis  for  C G  and 
the  basis  for  the  spaces  Va  <8>  V*  is  the  Fourier  transform,  which  can  be  carried 
out  efficiently  for  most  groups  of  interest  [7, 12,23,32], 

Since  pH  is  invariant  under  the  left  multiplication  action  of  G,  the  decompo¬ 
sition  (2)  shows  that  it  is  block  diagonal  in  the  Fourier  basis,  with  blocks  labeled 
by  the  irreps  itgG.  For  each  a,  there  is  a  dim  VCT  x  dim  Va  block  that  appears 
dim  Vcr  times  (or  in  other  words,  the  state  is  maximally  mixed  in  the  row  space). 
Thus,  without  loss  of  information,  we  can  measure  the  irrep  name  a  and  discard 
the  information  about  which  cr-isot.ypic  block  occurred. 

The  process  of  measuring  the  irrep  name  a  is  referred  to  as  weak  Fourier  sam¬ 
pling.  For  most  nonabelian  groups  (including  the  symmetric  group  [19,  25]  and 
the  dihedral  group),  weak  Fourier  sampling  alone  produces  insufficient  informa¬ 
tion  to  identify  the  hidden  subgroup  H.  To  obtain  further  information  about  H, 
we  must  perform  a  refined  measurement  inside  the  resulting  subspace.  This  is 
referred  to  as  strong  Fourier  sampling ,  and  there  are  many  possible  ways  to  do 
it,  especially  if  G  has  large  irreps. 

Of  course,  with  either  weak  or  strong  Fourier  sampling,  a  single  hidden  sub¬ 
group  state  is  not  sufficient  to  determine  FI:  we  must  repeat  the  sampling  pro¬ 
cedure  to  obtain  statistics.  However,  repeating  strong  Fourier  sampling  a  poly¬ 
nomial  number  of  times  is  not  sufficient  for  some  groups  (such  as  the  symmetric 
group),  even  if  measurements  can  be  chosen  adaptively  and  unlimited  classical 
processing  is  allowed  [34],  To  solve  the  HSP  in  general,  we  must  perform  a  joint 
measurement  on  k  —  poly(log  |Gj)  copies  of  pff .  In  fact,  there  are  groups  (again 
including  the  symmetric  group)  for  which  the  measurement  must  be  entangled 
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across  i?(log  |Gj)  copies  [24],  Thus  the  difficulty  of  the  general  HSP  may  be  at¬ 
tributed  at  least  in  part  to  that  fact  that  highly  entangled  measurements  are 
required.  While  0(log  \  G\)  copies  are  always  information-theoretically  sufficient 
[15]  (so  that,  in  particular,  the  query  complexity  of  the  HSP  is  polynomial),  there 
are  many  groups  for  which  it  is  not  known  how  to  efficiently  extract  the  identity 
of  the  hidden  subgroup. 

Although  previous  work  on  the  HSP  has  focused  almost  exclusively  on  Fourier 
sampling,  there  is  another  measurement  that  can  also  be  performed  without  loss 
of  information.  The  idea  is  to  exploit  the  symmetry  of  pf \k  under  permutations  of 
the  k  registers.  Thus,  we  should  consider  the  decomposition  of  (CG)®fc  afforded 
by  Schur  duality  [18],  which  decomposes  k  copies  of  a  d-dimensional  space  as 

(Cd)®*5fc!Wd0pA0gd  (3) 

At-fc 

where  the  symmetric  group  Sf.  acts  to  permute  the  k  registers  and  the  unitary 
group  Ud  acts  identically  on  each  register.  The  subspaces  V\  and  Q'(  correspond 
to  irreps  of  Si-  and  Ud,  respectively.  They  are  labeled  by  partitions  A  of  k  (denoted 
Ab/c),  i.e.,  A  =  (Ai,  A2, . . .)  where  Ai  >  A2  >  . . .  and  ]bh  A j  —  k.  (We  can  restrict 
our  attention  to  partitions  with  at  most  d  parts,  since  dim  Qf  —  0  if  A^+i  >  0.) 

Since  pff  is  invariant  under  the  action  of  Sk,  the  decomposition  (3)  shows  that 
it  is  block  diagonal  in  the  Schur  basis  with  blocks  labeled  by  A  b  k.  For  each  A, 
there  is  a  dim  Qx  xdim  Q(  block  that  appears  dim  Pa  times  (or  in  other  words, 
the  state  is  maximally  mixed  in  the  permutation  space).  Thus,  no  information 
is  lost  if  we  measure  the  partition  A  and  discard  the  permutation  register.  By 
analogy  to  weak  Fourier  sampling,  we  refer  to  the  process  of  measuring  A  as  weak 
Schur  sampling.  This  is  a  natural  measurement  to  consider  not  only  because  it 
can  be  performed  without  loss  of  information,  but  also  because  it  is  a  joint 
measurement  of  all  k  registers,  and  we  know  that  some  measurement  of  this 
kind  is  required  to  solve  the  general  HSP.  Unfortunately,  we  will  see  in  Section  2 
(and  see  also  Corollary  4  below)  that  weak  Schur  sampling  with  k  =  poly  (log  |G|) 
provides  insufficient  information  to  solve  the  HSP  unless  the  hidden  subgroup  is 
very  large  (in  which  case  the  problem  is  easy,  even  for  a  classical  computer). 

In  fact,  since  both  weak  Fourier  sampling  and  weak  Schur  sampling  can  be  per¬ 
formed  without  loss  of  information,  it  is  possible  to  perform  both  measurements 
simultaneously  (with  the  caveat  that  we  must  discard  the  irrelevant  information 
about  the  order  in  which  the  irreps  of  G  appear).  Even  though  the  statistics  of 
the  irrep  name  a  and  the  partition  A  do  not  provide  enough  information  to  iden¬ 
tify  the  hidden  subgroup,  this  does  not  preclude  the  possibility  that  their  joint 
distribution  is  more  informative.  However,  we  will  see  in  Section  3  that  unless 
we  are  likely  to  see  the  same  representation  more  than  once  under  weak  Fourier 
sampling  (which  is  typically  not  the  case),  the  Fourier  and  Schur  distributions 
are  nearly  uncorrelated.  Formally,  we  have 

Theorem  1  (Failure  of  weak  Fourier-Schur  sampling).  The  probability 
that  weak  Fourier-Schur  sampling  (defined  in  Section  3)  applied  to  pff  (defined 
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in  (1))  provides  a  result  that  depends  on  \H\  is  at  most  A;2<i^ax|iJ|/|G'| ,  where 
dmax  is  the  largest  dimension  of  an  irrep  of  G. 

This  implies  that  k  needs  to  be  large  for  most  cases  of  interest,  including  the 
dihedral  and  symmetric  groups. 

Corollary  2  (Weak  Fourier-Schur  sampling  on  T>n  and  Sn).  (a)  Weak 
Fourier-Schur  sampling  on  the  dihedral  group  T>n  cannot  distinguish  the  trivial 
subgroup  from  a  hidden  reflection  with  constant  advantage  (i.e.,  success  prob¬ 
ability  \  +  0(1))  unless  k  —  f2(y/~N).  (b)  Weak  Fourier-Schur  sampling  on 
the  symmetric  group  Sn  or  on  the  wreath  product  Sn  l  Z2  cannot  distinguish 
the  trivial  subgroup  from  an  order  2  subgroup  with  constant  advantage  unless 
k  =  exp  (0(y/n)). 

The  proof  that  weak  Schur  sampling  fails  is  based  on  the  simple  observation 
that  distinguishing  the  trivial  subgroup  from  a  subgroup  of  order  \H\  in  this 
way  requires  us  to  distinguish  1-to-l  from  |ff|-to-l  functions  on  G,  i.e.,  to  solve 
the  | H | -collision  problem  for  a  list  of  size  |G|.  Since  there  is  an  I2(y/\G\/\H\) 
quantum  lower  bound  for  this  problem  [1],  poly  (log  |G|)  registers  are  insufficient. 
In  fact,  the  problem  resulting  from  the  HSP  is  potentially  harder,  since  the  basis 
in  which  the  collisions  occur  is  inaccessible  to  the  Schur  measurement.  This 
naturally  leads  to  the  notion  of  a  quantum  collision  problem,  and  raises  the 
question  of  how  quickly  it  can  be  solved  on  a  quantum  computer,  which  we 
discuss  in  Section  4. 

We  first  consider  a  sampling  version  of  the  quantum  r-collision  problem.  Using 
results  on  the  asymptotics  of  the  Plancherel  measure  on  the  symmetric  group, 
we  prove  that  k  —  0(d/r )  registers  are  necessary  and  sufficient  to  solve  this 
problem.  In  particular,  we  have 

Theorem  3  (Quantum  collision  sampling  problem).  Given  p<s>k ,  distin¬ 
guishing  between  [case  A]  p  —  I/d  and  [case  B]  p2  —  pj-  (i.e.,  p  is  pro¬ 
portional  to  a  projector  of  rank  d/r)  is  possible  with  success  probability  1  — 
exp(—0(kr/d))/2.  In  particular,  constant  advantage  is  possible  iff  k  —  I2(d/r). 

In  addition  to  providing  the  first  results  on  estimation  of  the  spectrum  of  a 
quantum  state  in  the  regime  where  k  <F  d2 ,  this  gives  tight  estimates  of  the 
effectiveness  of  weak  Schur  sampling,  which  we  see  requires  an  exponentially 
large  (in  log  G|)  number  of  copies  to  be  successful. 

Corollary  4  (Failure  of  weak  Schur  sampling).  Applying  weak  Schur  sam¬ 
pling  to  pf/k  (where  pu  is  defined  in  (1)),  one  can  distinguish  the  case  \H\  >  r 
from  the  case  H  =  {1}  with  constant  advantage  iff  k  —  Q(\G\/r). 

The  connection  between  Theorem  3  and  Corollary  4  is  explained  in  Section  2. 

In  Section  4  we  also  introduce  a  black  box  version  of  the  quantum  collision 
problem.  We  show  that  it  can  be  solved  using  0(\J  d/r  log  d/r)  queries,  nearly 
matching  the  query  lower  bound  from  the  classical  problem. 
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2  Weak  Schur  Sampling 


We  begin  by  considering  only  the  permutation  symmetry  of  p%k,  without  taking 
into  account  symmetry  resulting  from  the  group  G.  In  other  words,  we  consider 
only  the  Schur  decomposition  (3),  and  we  perform  weak  Schur  sampling ,  i.e.,  a 
measurement  of  the  partition  A. 

The  projector  onto  the  subspace  labeled  by  a  particular  A  h  k  is 


n  A  := 


dim  V\ 
k\ 


Xa(tt)  P(ti) 

neSk 


(4) 


(see  e.g.  [40,  Theorem  8]),  where  xa  is  the  character  of  the  irrep  of  Sk  labeled  by  A 
and  P  is  the  (reducible)  representation  of  Sk  that  acts  to  permute  the  k  registers, 
i.e.,  P( 7r)|zi)  . . .  I ik)  =  Ibr-i(i))  •  •  •  \h r-i(fe))  for  all  n, . . . ,  ik  G  (1, . . . ,  d}.  For  any 
clk -dimensional  density  matrix  7,  the  distribution  under  weak  Schur  sampling  is 


Pr(A|7)  =  tr(I7A7) . 


(5) 


To  use  weak  Schur  sampling  in  a  quantum  algorithm,  it  is  important  that 
the  measurement  of  A  can  be  done  efficiently.  The  simplest  implementation  of 
the  complete  Schur  transform  [5],  which  fully  resolves  the  subspaces  V\  and  Q 
runs  in  time  poly(fc,  d),  and  thus  is  inefficient  when  cl  is  exponentially  large,  as 
in  the  HSP.  It  can  be  modified  to  run  in  time  poly(/c,  log  d)  either  by  a  relabeling 
trick  [26,  footnote  in  Section  8.1.2]  or  by  generalized  phase  estimation  [4,26] 
(which  may  be  viewed  as  a  generalization  of  the  well-known  swap  test  [6, 10]). 
Generalized  phase  estimation  only  allows  us  to  measure  A,  but  for  weak  Schur 
sampling  this  is  all  we  need.  In  this  procedure,  we  prepare  an  ancilla  register  in 
the  state  ~^=  J27reSh  1 7r) ,  use  it  to  perform  a  conditional  permutation  P( n)  on 
the  input  state  7,  and  then  perform  an  inverse  Fourier  transform  over  Sk  [7]  on 
the  ancilla  register.  Measurement  of  the  ancilla  register  will  then  yield  X  E  Sk, 
interpreted  as  a  partition  of  k,  distributed  according  to  (5). 

The  distribution  of  A  according  to  weak  Schur  sampling  is  invariant  under 
the  actions  of  the  permutation  and  unitary  groups,  since  these  groups  act  only 
within  the  subspaces  V\  and  Qf,  respectively.  In  other  words,  for  any  U  E 
lid,  any  7r  G  Sk,  and  any  c/fc-dimensional  density  matrix  7,  we  have  Pr(A|7)  = 
Pr(X\P(n)U®k  7  ®kP(n)^).  In  particular,  the  invariance  under  U®k  implies 

that  for  7  =  p®  ,  the  distribution  according  to  weak  Schur  sampling  depends 
only  on  the  spectrum  of  p. 

Now  it  is  easy  to  see  that  weak  Schur  sampling  on  k  —  poly  (log  |Gj)  copies  of 
pH  provides  insufficient  information  to  solve  the  HSP.  The  state  pH  is  propor¬ 
tional  to  a  projector  of  rank  |Gj/|F/j,  since 

Ph  =  TTwI  5Z  R(hh')  =  Tq\Ph  ■  (6) 

I  I  h,h'eH  I  I 

Because  the  distribution  of  measurement  outcomes  Pr(A|pf)fc)  depends  only  on 
the  spectrum  of  pH,  and  this  spectrum  depends  only  on  \H\,  different  subgroups 
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of  the  same  order  cannot  be  distinguished  by  weak  Schur  sampling.  In  fact,  even 
distinguishing  the  trivial  hidden  subgroup  from  a  hidden  subgroup  of  order  \H\  > 
2  (which  would  suffice  for,  e.g.,  graph  isomorphism)  requires  an  exponential 
number  of  hidden  subgroup  states. 

Suppose  that  weak  Schur  sampling  could  distinguish  between  hidden  subgroup 
states  corresponding  to  H  —  {1}  and  some  particular  H  of  order  \H\  >  2.  Since 
the  distribution  of  A  depends  only  on  the  spectrum,  this  would  mean  that  we 
could  distinguish  k  copies  of  the  maximally  mixed  state  I\qJ\G\,  where  Id  is  the 
cl  x  cl  identity  matrix,  from  k  copies  of  the  state  J\q\/\h\/ {\G\/\H\),  where  Jd > 
is  a  projector  onto  an  arbitrary  subspace  of  dimension  cl' .  This  in  turn  would 
imply  that  we  could  distinguish  1-to-l  functions  from  // 1  -  to- 1  functions  using 
k  queries  of  the  function.  Then  the  quantum  lower  bound  for  the  | H | -collision 
problem  [1]  shows  that  k  =  fi(y\G\/\H\)  copies  are  required. 

Of  course,  this  does  not  mean  that  0(%/\G\/\H\)  copies  are  sufficient.  In  fact, 
it  turns  out  that  a  linear  number  of  copies  is  both  necessary  and  sufficient,  as 
we  will  show  by  a  more  careful  analysis  in  Section  4.  There  we  will  sketch  the 
proof  of  Theorem  3,  which  by  the  arguments  of  this  section  implies  Corollary  4. 


3  Weak  Fourier-Schur  Sampling 


In  the  previous  section,  we  showed  that  weak  Schur  sampling  provides  insufficient 
information  to  efficiently  solve  the  HSP.  However,  even  though  weak  Fourier 
sampling  typically  also  does  not  provide  enough  information,  it  is  conceivable 
that  the  joint  distribution  of  the  two  measurements  could  be  substantially  more 
informative.  In  this  section,  we  will  see  that  this  is  not  the  case:  provided  weak 
Fourier  sampling  fails,  so  does  weak  Fourier-Schur  sampling. 

Since  neither  measurement  constitutes  a  loss  of  information,  it  is  in  princi¬ 
ple  possible  to  perform  both  weak  Fourier  sampling  and  weak  Schur  sampling 
simultaneously.  If  we  perform  weak  Fourier  sampling  in  the  usual  way,  measur¬ 
ing  the  irrep  label  for  each  register,  then  we  will  typically  obtain  a  state  that 
is  no  longer  permutation  invariant.  However,  since  the  irrep  labels  are  identi¬ 
cally  distributed  for  each  register,  the  order  in  which  the  irreps  appear  carries 
no  information.  Only  the  type  of  the  irreps,  i.e.,  the  number  of  times  each  irrep 
appears,  is  relevant.  Thus,  it  suffices  to  perform  what  we  might  call  weak  Fourier 
type  sampling ,  in  which  we  only  measure  the  irrep  type.  Equivalently,  we  could 
perform  complete  weak  Fourier  sampling  and  then  either  randomly  permute  the 
k  registers,  or  perform  weak  Schur  sampling  and  discard  the  V\  register. 

We  begin  by  performing  weak  Fourier  sampling.  The  hidden  subgroup  state 
pH  defined  in  (1)  has  the  following  block  structure  in  the  Fourier  basis: 


P«  =  ]g\  ©  1 


dim  V0 


aEG 


Y  a(hy 

heH 


Y  Pr(cr) 

creG 


him  yCT 

dim  Vcr 


®  PH,a  ■ 


(7) 


Here  the  probability  of  observing  the  irrep  o  under  weak  Fourier  sampling  is 
Pr(cr)  =  (dim  Va/\G\)  J^heH  X<r(h)*  and  the  state  conditioned  on  this  observa¬ 
tion  is  pH,a  =  (J2heHXa{h))~Yh.eH  W)(a\^a{h)* 
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Repeating  weak  Fourier  sampling  k  times,  we  get  pH,a  —  Ph,ox  <E>  •  •  •  <E>  PH,ak , 
where  a  (a i,  02, . . . ,  ah)  G  Gk  may  be  viewed  either  as  the  actual  outcome  of 
the  k  instances  of  weak  Fourier  sampling,  or  merely  as  a  representative  of  the 
irrep  type,  as  discussed  above.  Given  this  state,  the  conditional  probability  of 
observing  the  partition  A  is 

Pr(A|a)  =  tr(I7A  pH,a)  =  Xx (7r)  tr[P(7r)  PH,a]  ■  (8) 

7t  eSk 

Note  that  tr[P(7r)  Ph,o]  —  0  if  tt (a)  ^  g_,  where  7r(cr)  =  (^V-Hip  •  •  • ,  cr7 r-i(fc))- 

Proof  (Theorem  1).  Assume  that  a  is  multiplicity-free,  i.e.,  that  all  the  of  s 
are  different.  In  this  case  the  traces  are  zero  for  all  n  ^  1  (the  identity  of 

Sk)-  Then  Pr(A|a)  =  dl™,yx  yA(l)  tr pH, a  —  ,  which  is  nothing  but  the 

Plancherel  distribution  over  Sk,  and  which  in  particular  is  independent  of  the 
hidden  subgroup  H.  This  shows  that  we  cannot  extract  any  information  about 
H  provided  that  we  have  obtained  a  multiplicity-free  a. 

Finally,  we  can  use  \Xa(h)\  <  dim  Vo-  to  show  that  the  probability  of  any  a  is 
<  d^ax|IP|/|G|,  and  then  use  a  union  bound  to  prove  that  a  is  multiplicity- free 
with  probability  >  1  —  (2)  I  ^ I  / 1  ^ I  • 

In  [11]  two  of  us  considered  an  alternative  approach  to  graph  isomorphism  based 
on  the  nonabelian  hidden  shift  problem.  It  can  be  shown  that  weak  Fourier-Schur 
sampling  fails  for  similar  reasons  when  applied  to  hidden  shift  states  instead  of 
hidden  subgroup  states. 

4  The  Quantum  Collision  Problem 

In  Section  2,  we  saw  that  weak  Schur  sampling  cannot  efficiently  solve  the  HSP 
since  this  would  require  solving  the  collision  problem.  In  fact,  the  problem  faced 
by  weak  Schur  sampling  is  considerably  harder,  since  no  information  is  available 
about  the  basis  in  which  collisions  occur.  This  motivates  quantum  generalizations 
of  the  usual  (i.e.,  classical)  collision  problem,  which  we  study  in  this  section. 

Let  us  briefly  review  the  classical  problem.  The  classical  r-collision  problem  is 
the  problem  of  determining  whether  a  black  box  function  with  d  inputs  (where 
r  divides  d)  is  1-t-o-l  or  r-to-1.  This  problem  has  classical  (randomized)  query 
complexity  0(\Jd/r) — as  evidenced  by  the  well-known  birthday  problem — and 
quantum  query  complexity  <9(  d/r)  [1,9].  The  classical  algorithm  is  quite  sim¬ 
ple:  after  querying  the  function  on  0(^/d/r)  random  inputs,  there  is  a  reason¬ 
able  probability  of  seeing  a  collision,  provided  one  exists.  The  quantum  algo¬ 
rithm  is  slightly  more  subtle,  making  use  of  Grover’s  algorithm  for  unstructured 
search  [20].  In  particular,  while  the  classical  algorithm  queries  the  black  box  non- 
adaptively,  it  is  essential  for  the  quantum  algorithm  to  make  adaptive  queries. 

Here  we  first  consider  a  sampling  version  of  the  quantum  collision  problem, 
which  is  closely  connected  to  the  weak  Schur  sampling  approach  to  the  HSP,  and 
then  study  a  full-fledged  black  box  version  of  the  problem. 
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The  quantum  collision  sampling  problem.  The  quantum  r-collision  sam¬ 
pling  problem  is  the  problem  of  deciding  whether  one  has  k  copies  of  the  cl- 
dimensional  maximally  mixed  state  or  of  a  state  that  is  maximally  mixed  on 
an  unknown  subspace  of  dimension  d/r.  This  is  exactly  the  problem  faced  by 
the  weak  Schur  sampling  approach  to  the  HSP,  so  our  results  on  the  quantum 
collision  sampling  problem  give  tight  bounds  on  the  effectiveness  of  weak  Schur 
sampling.  It  turns  out  that  k  —  &(d/r )  copies  are  necessary  and  sufficient  to 
distinguish  these  two  cases  with  constant  advantage,  as  stated  by  Theorem  3. 

Proof  sketch  (Theorem  3).  Weak  Schur  sampling  is  the  optimal  strategy  to  dis¬ 
tinguish  states  p  with  [case  A]  p  =  I/d  or  [case  B]  p2  —  p/A  We  call  the  resulting 
distribution  of  A  b  k  arising  in  case  A  the  Schur  distribution,  Schur (k,d),  with 

dim  V\  dim  Qf  (dim  Pa)2  tt  /  j—  A 

#  =  k\  W  l1  +  ^w' 

(*b)e A  v  7 


The  second  equality  follows  from  Stanley’s  formula  for  dim  Q f  [42] ,  interpreting 
A  as  a  Young  diagram,  where  (i,  j)  e  A  iff  1  <  j  <  A,  .  The  outcomes  in  case  B 
are  also  Schur-distributed  (by  a  simple  representation-theoretic  argument),  but 
here  the  distribution  is  Schur (k,  d/r). 

Our  first  goal  is  to  show  that  the  distributions  Schur ( k,  d)  and  Schur (k,d/r) 
are  close  when  k  <C  d/r.  We  do  this  by  showing  that  when  k  <C  d ,  Schur (k,  d)  is 
close  to  the  Plancherel  distribution  of  A  b  k,  Planch(fc),  for  which 


pr(A)  =  (dhi;f-°2 . 
k\ 


(10) 


Using  (9)  and  (10),  the  l\  distance  A^d  ||  Schur (k,d)  —  Planch(/c)||i  is 


E 

Ahfc 


n 


(11) 


where  the  expectation  is  over  Planch(fc).  Using  Cauchy- Schwartz  and  the  in¬ 
equality  1  +  x  <  ex,  we  can  upper  bound  (11)  by 


r  < 
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E  exp ( 2 
\\-k  V 


(*d)e  A 
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——  E  m(A)? 

ml  dm  Ahfc  v  ’ 
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where  fi(A)  :=  j)e\U  ~  *)•  Finally,  we  use  calculations  of  the  moments  of  V\ 
obtained  by  Kerov  in  the  course  of  describing  the  asymptotically  Gaussian  fluc¬ 
tuations  about  the  limiting  shape  of  the  typical  diagram  under  the  Plancherel 
distribution  [29].  This  establishes  A^d.  <  a/2 (k/d),  and  it  follows  from  the  tri¬ 
angle  inequality  that  Schur (k,  d)  and  Schur (k,  d/r)  are  close  when  k  <C  d/r. 

Conversely,  we  would  like  to  show  that  if  k  3>  d/r,  then  Schur (k,d)  is  far 
from  Schur (k,  d/r).  We  do  this  by  first  proving  a  lower  bound  on  A^td  (using 
similar  techniques  as  in  the  upper  bound  on  A^^,  as  well  as  a  one-sided  Cheby- 
shev  inequality  showing  fi(A)2  >  f2(k2)  with  constant  probability).  Then  we 


606 


A.M.  Childs,  A.W.  Harrow,  and  P.  Wocjan 


combine  this  with  the  upper  bound  on  Ak,d  and  use  a  monotonicity  argument 
(||Schur(fc,  di)  —  Schur(fc,  G^llr  >  ||Schur(fc,  rdfi)  —  Schur(fc,  refill)  to  separate 
the  Schur  distributions.  This  completes  the  proof  sketch. 

To  put  Theorem  3  in  context,  we  can  compare  it  to  results  on  spectrum 
estimation.  When  k  — *  oo  with  d  fixed,  applying  the  measurement  to 

p<s>k  and  outputting  A  :=  X/k  has  long  been  known  to  be  a  valid  estimator  of 
the  spectrum  of  p  [30].  Indeed,  if  r\  >  . . .  >  are  the  eigenvalues  of  p,  then 
tr  II\p®k  <  (k  +  l)^^"1)/2  exp  (— fc.D(A||r)) ,  where  D(p\\q )  :=  YhiPi  l°g(Pi/9i)  is 
the  (classical)  relative  entropy  [13,27].  This  inequality  is  usually  only  interesting 
when  k  =  J l(d2),  so  our  Theorem  3  can  be  viewed  as  the  first  positive  result  for 
spectrum  estimation  in  the  regime  where  k  —  o(c/2). 

A  black  box  for  the  quantum  collision  problem.  A  complete  definition  of 
the  quantum  collision  problem  requires  us  to  specify  a  unitary  black  box  that 
hides  the  function,  and  that  allows  us  to  make  adaptive  queries.  We  now  propose 
one  such  definition,  and  show  that  the  resulting  quantum  r-collision  problem  can 
be  solved  in  0(^/d/r  log  d/r )  queries,  nearly  matching  the  J ?( y/d/r)  lower  bound 
from  the  classical  collision  problem. 

Consider  a  quantum  oracle  that  implements  the  isometry  |i)  ha  |i)|V’/(*))> 
where  B  •  •  • ,  is  an  arbitrary  (unknown)  orthonormal  basis  of  Cd 

and  /  is  either  a  1-to-l  function  or  an  r-to-1  function.  The  goal  is  to  determine 
which  is  the  case  using  as  few  queries  as  possible.  We  assume  that  the  isometry 
is  extended  to  a  unitary  operator  R  acting  on  Cd<S>Cd  by  \i)\y)  ha  |i)  U\y(Bf(i)), 
where  U  JT  \fi>i)(i\  is  the  unitary  matrix  effecting  a  transformation  from  the 
standard  basis  to  B.  We  also  assume  we  can  perform  its  inverse  R' . 

By  considering  the  case  where  the  basis  B  (or  equivalently  U)  is  known,  it  is 
clear  that  the  quantum  lower  bound  for  the  usual  collision  problem  implies  an 
i?(  y/ d/r )  lower  bound  for  the  quantum  collision  problem  as  well.  We  present 
an  algorithm  for  this  problem  that  uses  only  0(y/d/r  log  d/r )  queries.  The  ba¬ 
sic  idea  is  to  adapt  the  quantum  algorithm  for  the  classical  collision  problem 
[9].  That  algorithm  is  not  directly  applicable  to  the  quantum  problem  since  we 
cannot  check  equality  of  quantum  states.  However,  the  swap  test  can  determine 
whether  two  states  are  identical  or  orthogonal  with  one-sided  error  of  1/2.  With 
0(log  d)  copies  of  each  state,  this  error  (and  the  resulting  state  disturbance)  can 
be  reduced  to  1/ poly(d).  We  use  this  amplified  swap  test  to  prove 

Theorem  5.  The  query  complexity  of  the  quantum  r-collision  problem  for  a  list 
of  size  d  is  0(^/d/r  log  d/r). 

Proof.  We  first  outline  the  quantum  algorithm  of  [9]  for  the  classical  collision 
problem.  The  algorithm  builds  a  table  of  a  random  set  of  y  d/r  items  and  uses 
Grover’s  algorithm  to  search  the  remaining  items  for  a  collision  with  an  entry 
of  the  table.  The  entries  of  the  table  are  distinct  with  high  probability.  If  /  is 
r-to-1,  there  are  (r  —  l^d/r)1/3  solutions  among  <  d  items,  for  a  total  query 
complexity  of  0{\J d/ [ r(d/r )1/3])  =  0((d/r)1/2‘). 

Now  we  adapt  this  algorithm  to  the  quantum  problem.  Using  the  amplified 
swap  test,  we  can  effectively  test  equality  using  m  2  +  2  log  d/r  copies  of  the 
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quantum  states,  increasing  the  query  complexity  only  by  a  factor  of  0{\ogd/r). 
For  this  to  work,  it  is  important  that  we  can  reuse  the  states  corresponding  to 
the  entries  in  the  table,  so  we  will  need  m  copies  of  each  state  in  the  table  as 
well.  Iterating  this  swap  test,  we  find  that  the  error  after  ^  Grover  iterations 
is  at  most  0,  •  21~m/2  <  £r/cl  Since  the  number  of  Grover  iterations  is  £  — 
0((d/ r)1/3),  the  total  error  is  asymptotically  negligible,  and  we  obtain  nearly 
the  same  performance  as  in  the  classical  collision  problem. 

5  Discussion 

We  have  shown  that  weak  Fourier-Schur  sampling  typically  provides  insufficient 
information  to  solve  the  hidden  subgroup  problem.  Nevertheless,  it  remains  pos¬ 
sible  that  Schur  duality  could  be  a  useful  tool  for  the  HSP.  Just  as  weak  Fourier 
sampling  refines  the  space  into  smaller  subspaces  in  which  we  can  perform  strong 
Fourier  sampling,  even  when  it  alone  fails  to  solve  the  HSP,  so  we  can  use  weak 
Fourier-Schur  sampling  to  decompose  the  space  even  further.  The  Schur  de¬ 
composition  has  the  additional  complication  that  the  refined  subspaces  are  no 
longer  simply  tensor  products  of  single-copy  subspaces,  but  this  may  actually  be 
an  advantage  since  entangled  measurements  are  known  to  be  necessary  for  some 
groups.  Also,  Schur  sampling  may  be  useful  for  implementing  optimal  measure¬ 
ments,  which  are  typically  entangled  [2,3]. 

In  principle,  strong  Fourier-Schur  sampling  is  guaranteed  to  provide  enough 
information  to  solve  the  HSP,  simply  because  the  hidden  subgroup  states  are 
always  distinguishable  with  k  =  poly  (log  |G|)  copies.  However,  it  would  be  inter¬ 
esting  to  find  a  new  efficient  quantum  algorithm  for  some  HSP  based  on  strong 
Fourier-Schur  sampling.  Perhaps  a  first  step  in  this  direction  would  be  to  an¬ 
alyze  the  performance  of  measurement  in  a  random  basis,  as  has  been  studied 
extensively  in  the  case  of  weak  Fourier  sampling  [19,33,35,39]. 

Moving  away  from  our  original  motivation  of  the  HSP,  the  quantum  collision 
problem  may  be  of  independent  interest.  As  discussed  in  Section  4,  our  results 
on  the  quantum  collision  sampling  problem  can  be  viewed  as  an  exploration  of 
spectrum  estimation  with  k  =  o(d2)  copies,  but  much  remains  unknown  about 
that  regime.  Many  open  problems  also  remain  regarding  variants  of  the  black 
box  version  of  the  quantum  collision  problem. 
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